#BabelOfCode 2024
Week 3
Language: x86_64 assembly [AMD64] (macroassembler: GNU as/gas)
PREV WEEK: https://mastodon.social/@mcc/113783248514095140
NEXT WEEK: https://mastodon.social/@mcc/113906616486081430
RULES: https://mastodon.social/@mcc/113676228091546556
I planned ASM for today and when I saw the challenge *almost* bounced to TCL, because I *don't* wanna write a parser in ASM. But the language here is exceedingly regular, so probs a state machine is enough.
Successfully ran this hello world https://cs.lmu.edu/~ray/notes/gasexamples/ which I think should be all I need to start
My language "confidence level" for this week is high, but down to medium-high for step 2 (because obvs I don't know WHAT they'll throw at me at step 2). I'm kinda unenthused about the gas macro language. The macro language documentation ( https://sourceware.org/binutils/docs/as/Macro.html + https://sourceware.org/binutils/docs/as/Altmacro.html , I think that's literally all they wrote ) is sketchy and unclear. Can macros take a macro name as argument and invoke the passed-in macro? I literally can't tell. I'm going to uncover syntax by trial and error
I've decided to add a new rule to my challenge, which is in addition to doing a different language every week I'm going to try to use exclusively *languages I haven't programmed in before*.
If that's the rule, x86_64 is a stretch as I've *written* x86_64— but I count it as valid, because I've never written a whole AMD64 *program*, only snippets embedded in a C file or OllyDbg-injected into an exe at runtime. Only ASMs I've written whole programs in are MIPS and LLVM in-memory representation.
Finding many things that are just sort of It's Assumed You Know This but may or may not be written anywhere. Like, there's a "movb" instruction which is not in the instruction reference I'm using and is not recognized by my syntax highlighter, but gas accepts it.
Question. I do
mov %al, %esi
It says operand type mismatch. OK. I think I can simulate this with
mov $0, %rax
mov $eax, $esi
BC non-al bits of rax get cleared in instruction 1.
…But what's the "widening"/truncating version of mov?
I'm sorry the x86_64 multiply instruction works fucking *how*. What fucking century is it
So it took longer than I'd hoped, but I now have a working first-pass AMD64 ASM program that can decode an ASCII number in the .data segment and print it out again.
Build instructions in adjacent run.txt.
I have some questions.
(1 of 2). I think I don't like GNU/AT&T assembly format and would like to switch to Intel assembly format. Is Intel format… documented… somewhere? This is the closest I found. https://sourceware.org/binutils/docs/as/i386_002dVariations.html
2. At a certain point in my code, I wanted to load a pointer to the .data segment variable "input" into my %r10. The way to do this turned out to be
lea input(%rip), %r10
rip is… the instruction pointer?? what the devil is the instruction pointer doing there? `input` is at a fixed location, surely it's not loading it from an address relative to the fricking instruction pointer.
Expanding on my question re: "where is intel assembly format actually documented?"
mov rax, 60
This is pretty simple, right? I want the number 60 in rax. This says: ambiguous operand size for mov. Oh, there was something about that in the gas manual. Okay, I say:
mov rax, dword 60
It says: junk 60 after expression
What the heck do I do now? Do I just come back to mastodon for help every time I want to type a number? All the StackOverflow examples on are AT&T format.
Findings so far:
- If you put ".intel_syntax" at the top of a gas file, it does *not* give you intel syntax *or* AT&T syntax but a secret third thing. The way to get the real intel syntax is ".intel_syntax noprefix"
- It didn't accept the 0(reg) syntax to dereference. By experimentation, I found I could do 0[reg]. That is terrifying. Guessing, I mean.
- No one I have spoken to has learned intel syntax by anything other than oral tradition. Also, no one uses intel with gas (they all use nasm?)
@mcc IIRC a lot of the extended syntax was defined by MASM, sometimes to make life simpler for their extremely powerful macros. I don't know how much of that was also adopted by NASM. So well worth checking with both MASM and NASM. But yes - lots of oral tradition here...
@TomF So what I am looking for is neither nasm nor masm, but rather "Intel syntax". Clang and GCC both have modes in which they purport to follow "Intel syntax". To me, this is like Clang and GCC promising that an "Intel syntax" exists. From my research, unless there's a BNF I haven't found hiding in this 5000 page Intel x86_64 manual ( https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html ) , Intel has never defined such a thing. It was apparently only *implied* by examples in 8086 datasheets. (1/2)
@TomF Based on this, in my opinion, GCC and Clang should for clarity stop referring to "Intel syntax" and, taking a cue from ARC, refer to "Alleged Intel syntax", or perhaps "Intel folk syntax".
However, I'm also perplexed, because if there's no source of truth for "Intel syntax", then how did clang and gcc know what to implement? Or rather, how do clang and gcc know their "Intel syntax"es are compatible with each *other*? (2/2)
@mcc I added a whole bunch of instructions to x86 (what became AVX512), including new syntax for the mask registers. I remember trying to find out who "owned" that, and whether we should use v0(k1), v0[k1] or v0{k1} or some other syntax.
Sadly I don't have my notes from that time, but my vague recollection is that the answer was "nobody cares - pick one". Which was very alarming! I did have some feedback from our internal assembler team, but they stressed that they were NOT a public authority.
@mcc The official tools Intel provides are the C intrinsics - and they are of course C syntax, so have no bearing on the assembly.
So yeah, my recollection is we picked what seemed sensible and went with it. BUT - that was just for the purposes of ISA documentation - there was no hard link to the actual syntax accepted by the assemblers (dramatically so in the case of AT&T syntax).
So it really does seem like a thing nobody owns, except for each specific tool vendor!
@TomF @mcc Intel provided an assembler at one time (ASM86), maybe they still do as part of ICC? And basically “intel syntax” is a descendent of that per oral tradition. It’s Intel syntax because its the syntax that Intel’s asembler used, and that the Intel datasheets use; as opposed to AT&T syntax, the syntax that AT&T’s assembler for Unix used.
When Microsoft made MASM it copied the syntax. Borland’s Turbo Assembler (TASM) copied that. Everything else “intel syntax” is a descendent of those two
In ASM86 and MASM, what mov eax, foo
does is not immediately obvious. If “foo” is defined as constant (label EQU 0xf00
), it’ll set EAX to 0xf00
. If “foo” is defined as a variable, it’ll load the contents of that variable.
TASM added “Ideal Mode”, in which this is always consistent: mov eax, foo
always sets EAX to the address of the foo label; mov eax, [foo]
loads from that address.
Most other assemblers implementing Intel syntax (NASM, FASM, YASM, GAS w/ .intel_synatx noprefix
) are broadly copying Ideal Mode
But it’s all kind of vibes.
@erincandescent Then I still assert we shouldn't be calling it "intel syntax" if it's "vaguely intel inspired syntax"!
@erincandescent I reckon by AT&T syntax we really mean UNIX Syntax (and after all GNU is UNIX)
@mcc @erincandescent I don't care what we call it as long as I don't have to read it.
@dalias @erincandescent for the record the gnu syntax may have an authoritative documentation source but there are significant gaps in that documentation