~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~~ Assembly language for the AMD64 architecture ~~ ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~ This is also often called the x86-64 architecture, but Intel didn't ~ invent it (they had their chance) and there's no reason to name it after ~ their product line. We have a bunch of assembler words that, taken as a ~ whole, form a sort of assembly language inside of the Forth-style language. ~ ~ It's all backwards and stuff. ~ ~ Okay, but seriously, the convention is: target on the top of the stack, ~ source behind it. This is similar to how the Forth "!" and "@" words work. ~ ~ These routines use the binary packing routines such as pack64, defined in ~ core.e. They're called in the same way: an output address which we call the ~ "output point", followed by data items specific to what's being output. They ~ also chain together in the same way, returning the updated output point. ~ ~ TODO cite the Intel reference manual here and explain the notation used ~ for the section citations below ~ ~ TODO define instructions, assembly code, machine code, opcodes. if we ever ~ also want to recommend a childrens' introduction to binary, this might be ~ the place to do it. ~ Keywords ~ ~~~~~~~~ ~ ~ We define a bunch of keywords, which evaluate to their own codeword ~ addresses. We use these to refer to registers and condition codes by name. ~ ~ ~ On registers ~ ~~~~~~~~~~~~ ~ ~ The x86 architecture has been around a while, it has been through ~ several transitions from smaller word sizes to larger ones. Therefore it ~ has different names for the "same" registers, depending on how much of ~ them you're using. ~ ~ TODO there's more to write here ~ The names of the 64-bit registers. The second half of these are considered ~ "extended" registers because they don't correspond to 32-bit registers in ~ the way the first eight do. s" :rax" keyword s" :rcx" keyword s" :rdx" keyword s" :rbx" keyword s" :rsp" keyword s" :rbp" keyword s" :rsi" keyword s" :rdi" keyword s" :r8" keyword s" :r9" keyword s" :r10" keyword s" :r11" keyword s" :r12" keyword s" :r13" keyword s" :r14" keyword s" :r15" keyword ~ The names of the 32-bit registers. The processor treats these as being ~ alternate names for the low halves of the 64-bit registers. There is a ~ very finicky distinction about what that means in different settings: Some ~ instructions operate on a 32-bit source or target, while others merely ~ accept a 32-bit value that gets sign-extended to 64 bits. We've taken pains ~ to clarify these cases in the instruction-specific notes, as they come up. s" :eax" keyword s" :ecx" keyword s" :edx" keyword s" :ebx" keyword s" :esp" keyword s" :ebp" keyword s" :esi" keyword s" :edi" keyword ~ The names of the 16-bit registers. Similarly, the processor treats these ~ as being alternate names for the low halves of the 32-bit registers. s" :ax" keyword s" :cx" keyword s" :dx" keyword s" :bx" keyword s" :sp" keyword s" :bp" keyword s" :si" keyword s" :di" keyword ~ The names of the 8-bit registers. The pattern here is a little bit ~ different; these come in "low" and "high" pairs, where for example :al is ~ the low half of :ax and :ah is the high half. Yes, this architecture grows ~ like a tree, with all the old things being still present, surrounded in the ~ new ones. s" :al" keyword s" :cl" keyword s" :dl" keyword s" :bl" keyword s" :ah" keyword s" :ch" keyword s" :dh" keyword s" :bh" keyword ~ The condition codes. Yes, there sure is a lot of duplication in these ~ names. The names are based on Intel's documented mnemonics... ~ ~ "Above" and "below" are for unsigned comparisons. "Greater" and "less" are ~ for signed comparisons. ~ ~ This is documented on the individual opcode pages, and also in B.1.4.7. s" :cc-overflow" keyword s" :cc-no-overflow" keyword s" :cc-below" keyword s" :cc-above-equal" keyword s" :cc-equal" keyword s" :cc-not-equal" keyword s" :cc-below-equal" keyword s" :cc-above" keyword s" :cc-sign" keyword s" :cc-not-sign" keyword s" :cc-even" keyword s" :cc-odd" keyword s" :cc-less" keyword s" :cc-greater-equal" keyword s" :cc-less-equal" keyword s" :cc-greater" keyword ~ Bits and pieces ~ ~~~~~~~~~~~~~~~ ~ ~ Here, we have a bunch of helpers which generate specific encoded ~ representations that are part of many instructions. We start with the ~ trivial ones that handle individual fields, then work up to combinations of ~ fields. ~ ~ When we say that a word accepts a register as a parameter, what we mean ~ is it accepts the name keyword for that register. When we say that a word ~ accepts a scale factor, what we mean is that it accepts a byte count for ~ that scale factor. In the cases where we mean the encoded form, we'll say ~ "encoded value" or "value". ~ ~ TODO surely we can find a way to have real flow-control words ~ (register -- 3-bit encoded value for register) : reg64 ~ In counting the words for the branches, notice that each integer literal ~ is two words. dup :rax = 0branch [ 5 8 * , ] drop 0 exit dup :rcx = 0branch [ 5 8 * , ] drop 1 exit dup :rdx = 0branch [ 5 8 * , ] drop 2 exit dup :rbx = 0branch [ 5 8 * , ] drop 3 exit dup :rsp = 0branch [ 5 8 * , ] drop 4 exit dup :rbp = 0branch [ 5 8 * , ] drop 5 exit dup :rsi = 0branch [ 5 8 * , ] drop 6 exit dup :rdi = 0branch [ 5 8 * , ] drop 7 exit ." Parameter to reg64 is not a reg64." 1 sys-exit ; ~ (register -- 3-bit encoded value for register) : extrareg64 dup :r8 = 0branch [ 5 8 * , ] drop 0 exit dup :r9 = 0branch [ 5 8 * , ] drop 1 exit dup :r10 = 0branch [ 5 8 * , ] drop 2 exit dup :r11 = 0branch [ 5 8 * , ] drop 3 exit dup :r12 = 0branch [ 5 8 * , ] drop 4 exit dup :r13 = 0branch [ 5 8 * , ] drop 5 exit dup :r14 = 0branch [ 5 8 * , ] drop 6 exit dup :r15 = 0branch [ 5 8 * , ] drop 7 exit ." Parameter to extrareg64 is not an extrareg64." 1 sys-exit ; ~ (register -- 3-bit encoded value for register) : reg32 dup :eax = 0branch [ 5 8 * , ] drop 0 exit dup :ecx = 0branch [ 5 8 * , ] drop 0 exit dup :edx = 0branch [ 5 8 * , ] drop 0 exit dup :ebx = 0branch [ 5 8 * , ] drop 0 exit dup :esp = 0branch [ 5 8 * , ] drop 0 exit dup :ebp = 0branch [ 5 8 * , ] drop 0 exit dup :esi = 0branch [ 5 8 * , ] drop 0 exit dup :edi = 0branch [ 5 8 * , ] drop 0 exit ." Parameter to reg32 is not a reg32." 1 sys-exit ; ~ (register -- 3-bit encoded value for register) : reg16 dup :ax = 0branch [ 5 8 * , ] drop 0 exit dup :cx = 0branch [ 5 8 * , ] drop 1 exit dup :dx = 0branch [ 5 8 * , ] drop 2 exit dup :bx = 0branch [ 5 8 * , ] drop 3 exit dup :sp = 0branch [ 5 8 * , ] drop 4 exit dup :bp = 0branch [ 5 8 * , ] drop 5 exit dup :si = 0branch [ 5 8 * , ] drop 6 exit dup :di = 0branch [ 5 8 * , ] drop 7 exit ." Parameter to reg16 is not a reg16." 1 sys-exit ; ~ (register -- 3-bit encoded value for register) : reg8 dup :al = 0branch [ 5 8 * , ] drop 0 exit dup :cl = 0branch [ 5 8 * , ] drop 1 exit dup :dl = 0branch [ 5 8 * , ] drop 2 exit dup :bl = 0branch [ 5 8 * , ] drop 3 exit dup :ah = 0branch [ 5 8 * , ] drop 4 exit dup :ch = 0branch [ 5 8 * , ] drop 5 exit dup :dh = 0branch [ 5 8 * , ] drop 6 exit dup :bh = 0branch [ 5 8 * , ] drop 7 exit ." Parameter to reg8 is not a reg8." 1 sys-exit ; ~ There's a packed format called the SIB byte, which we'll get to in a ~ second. One of its bitfields is called the scale field. This word produces ~ an encoded value for that field. ~ ~ The input value is a byte count; the output value is suitable for use in ~ the SIB byte. ~ ~ (scale factor -- 2-bit encoded value) : scalefield dup 1 = 0branch [ 5 8 * , ] drop 0 exit dup 2 = 0branch [ 5 8 * , ] drop 1 exit dup 5 = 0branch [ 5 8 * , ] drop 2 exit dup 8 = 0branch [ 5 8 * , ] drop 3 exit ." Parameter to scalefield is not 1, 2, 4, or 8." 1 sys-exit ; ~ [Intel] volume 2D, appendix B, section B-1.4.7, table B-10. Also see the ~ individual opcode pages. ~ ~ Every instruction has an "opcode", a specific byte or sequence of bytes ~ which uniquely identifies the combination of operation, addressing mode, ~ and certain miscellaneous characteristics. This is not just another way of ~ referring to the entire sequence of bytes corresponding to the instruction; ~ the opcode is a specific part within that, as distinct from ie. the rex ~ byte, the SIB byte, the Mod/RM byte, and various immediate values and other ~ rare tidbits. ~ ~ Some of these opcodes have bitfields within them, to specify condition ~ codes. This word produces an encoded value for that condition-code field. ~ ~ (condition -- 4-bit encoded value) : condition-code dup :cc-overflow = 0branch [ 5 8 * , ] drop 0 exit dup :cc-no-overflow = 0branch [ 5 8 * , ] drop 1 exit dup :cc-below = 0branch [ 5 8 * , ] drop 2 exit dup :cc-above-equal = 0branch [ 5 8 * , ] drop 3 exit dup :cc-equal = 0branch [ 5 8 * , ] drop 4 exit dup :cc-not-equal = 0branch [ 5 8 * , ] drop 5 exit dup :cc-below-equal = 0branch [ 5 8 * , ] drop 6 exit dup :cc-above = 0branch [ 5 8 * , ] drop 7 exit dup :cc-sign = 0branch [ 5 8 * , ] drop 8 exit dup :cc-not-sign = 0branch [ 5 8 * , ] drop 9 exit dup :cc-even = 0branch [ 5 8 * , ] drop 10 exit dup :cc-odd = 0branch [ 5 8 * , ] drop 11 exit dup :cc-less = 0branch [ 5 8 * , ] drop 12 exit dup :cc-greater-equal = 0branch [ 5 8 * , ] drop 13 exit dup :cc-less-equal = 0branch [ 5 8 * , ] drop 14 exit dup :cc-greater = 0branch [ 5 8 * , ] drop 15 exit ." Parameter to condition-code is not a condition code." 1 sys-exit ; ~ The "rex" byte appears before an opcode to modify its behavior in various ~ ways. It has four distinct bits within it, leading to sixteen variations, ~ as you can see. ~ ~ The way these are all spelled out like this is slightly ridiculous, there ~ must be a better way. We only ever use rex-w and rex-wb, so it's tempting to ~ get rid of the rest, but they're worth having so that our future selves ~ don't have to revisit this topic. ~ ~ (output point -- output point) : rex-0 0x40 pack8 ; : rex-w 0x48 pack8 ; : rex-r 0x44 pack8 ; : rex-x 0x42 pack8 ; : rex-b 0x41 pack8 ; : rex-wr 0x4C pack8 ; : rex-wx 0x4A pack8 ; : rex-wb 0x49 pack8 ; : rex-rx 0x46 pack8 ; : rex-rb 0x45 pack8 ; : rex-xb 0x43 pack8 ; : rex-wrx 0x4E pack8 ; : rex-wrb 0x4D pack8 ; : rex-wxb 0x4B pack8 ; : rex-rxb 0x47 pack8 ; : rex-wrxb 0x4F pack8 ; ~ Some opcodes use their low three bits as a field to give a register name. ~ This is usually in addition to a register name given in a Mod/RM byte, ~ serving a different role for the instruction. ~ ~ This word accepts an opcode byte with those three bits clear, and combines ~ it with a register value, then outputs the resulting byte. Each opcode ~ accepts some specific kind of register; to allow different kinds, here we ~ expect the step of converting the register name to the encoded bits to have ~ already been done. ~ ~ (output point, 3-bit encoded value for register, opcode byte ~ -- output point) : opcodereg | pack8 ; ~ Some opcodes use their low four bits as a field to give a condition code. ~ This word accepts an opcode byte with those four bits clear, and combines it ~ with a condition code value, then outputs the resulting byte. For ~ consistency with opcodereg, we expect the step of converting the condition ~ code name to the encoded bits to have already been done. ~ ~ (output point, 4-bit encoded value for condition code, opcode byte ~ -- output point) : opcodecc | pack8 ; ~ A Mod/RM byte ("mode / register-or-memory") is part of the encoding of ~ many instructions. It's divided into three fields: "mod" (mode), ~ register/opcode, and register/memory ("RM"). ~ ~ This word outputs a Mod/RM byte given fully-processed, numeric values for ~ its fields. Most code will want to call one of the higher-level ~ addressing-* words, instead. ~ ~ (output point, mod field, register/opcode field, register/memory field ~ -- output point) : modrm swap 8 * | swap 64 * | pack8 ; ~ An SIB byte ("scale, index, base") is part of the encoding of many ~ instructions. It's divided into three fields, with the names you've already ~ guessed. ~ ~ This word outputs an SIB byte given fully-processed, numeric values for ~ its fields. ~ ~ (output point, scale field, index field, base field -- output point) : sib swap 8 * | swap 64 * | pack8 ; ~ Addressing modes ~ ~~~~~~~~~~~~~~~~ ~ ~ These are higher-level words meant to be easier to use than the bits and ~ pieces above. Each corresponds to some specific addressing mode. When ~ applicable, they accept keywords rather than pre-encoded values. ~ ~ That's not all the time, because there are cases, such as the reg/op ~ field, where the meaning is up to the individual instruction. In those ~ cases, these words do accept fully-processed, numeric values. ~ ~ The general rule is that the responsibility of these addressing-mode words ~ is for the parts that are common to all instructions using that addressing ~ mode. ~ The simplest of the addressing modes: Direct register addressing. There ~ are no special cases to check. ~ ~ It's important to notice that the R/M field may describe either a source, ~ or a target, depending on what the instruction is. So, this helper doesn't ~ get to know that. It also doesn't get to know whether the value in the ~ reg/op field describes a register, or if instead it's an extension of the ~ opcode. The caller is responsible for figuring that all out. ~ ~ (output point, reg/op field value, reg/mem field register ~ -- output point) : addressing-reg64 reg64 3 3unroll modrm ; : addressing-reg8 reg8 3 3unroll modrm ; ~ This is a helper for assembly instructions that want to do a form of ~ addressing that requires a value of 1 in the modrm byte's mode field, and ~ do not want to do any indexing. That's the indirect mode, which takes a ~ 64-bit register, treats it as an address, and looks up the 64-bit value it ~ points to. ~ ~ The helper's main responsibility is to deal with the scenario that ~ requires an SIB byte, which happens when the R/M field has a value of 4, ~ which would otherwise refer to the register rsp. In that situation, it also ~ generates an SIB byte which indicates a scale of 1, no indexing, and rsp as ~ the base register. ~ ~ When the register is :rbp, the only modes available also have ~ displacement; we disallow that. For that case, use an instruction that ~ uses a disp8 mode, and set a displacement of 0. ~ ~ In understanding this, pay close attention to the Op/En column in the ~ opcode table. The "RM" variant means the ModRM byte's R/M field (the third ~ one) is the source, while its reg field (the middle one) is the target. This ~ is what we want, because the R/M field is the one that gets indirection ~ applied to it. Opcode 0x8B with an REX.W prefix is the all-64-bit RM ~ variant. [Intel] volume 2B, chapter 4, section 4-3, "MOV". ~ ~ For the indirection modes, don't be confused by the many similar tables. ~ 64-bit mode is encoded the same as 32-bit mode except for adding a REX.W ~ prefix, as per 2.2.1.1, so you want table 2-2 to understand the ModRM byte. ~ The presence or absence of an SIB byte is determined by where in that table ~ we fall, and we aren't using a mode that has one. [Intel] volume 2A, ~ chapter 2, section 2-1.5, table 2-2. ~ ~ (output point, reg/op field value, reg/mem field register ~ -- output point) : addressing-indirect-reg64 ~ Exit with an error if the R/M register is :rbp. dup :rbp != 0branch [ 23 8 * , ] ~ Check whether the R/M register is :rsp. Save the test result for later. dup :rsp = 4 unroll ~ (equality result, output point, reg/op value, reg/mem name) reg64 0 3unroll modrm ~ (equality result, output point) ~ If the R/M register was rsp, we need an SIB byte; otherwise, skip it. swap 0branch [ 8 8 * , ] 0 4 :rsp reg64 sib exit ." R/M parameter to addressing-indirect-reg64 is :rbp." 1 sys-exit ; ~ (output point, reg/op field value, reg/mem field register, ~ displacement value -- output point) : addressing-disp8-reg64 ~ This mode can do :rbp fine, so no need to check for that. ~ Check whether the R/M register is :rsp. Save the test result for later. swap dup :rsp = 5 unroll swap ~ Stash the displacement value out of the way, too. 4 unroll reg64 1 3unroll modrm ~ If the R/M register was rsp, we need an SIB byte; otherwise, skip it. 3roll 0branch [ 8 8 * , ] 0 4 :rsp reg64 sib ~ The displacement byte. swap pack8 ; ~ (output point, reg/op field value, reg/mem field register, ~ displacement value -- output point) : addressing-disp32-reg64 ~ This mode can do :rbp fine, so no need to check for that. ~ Check whether the R/M register is :rsp. Save the test result for later. swap dup :rsp = 5 unroll swap ~ Stash the displacement value out of the way, too. 4 unroll reg64 2 3unroll modrm ~ If the R/M register was rsp, we need an SIB byte; otherwise, skip it. 3roll 0branch [ 8 8 * , ] 0 4 :rsp reg64 sib ~ The displacement value. swap pack32 ; ~ (output point, reg/op field value, ~ scale factor, index register, base field register ~ -- output point) : addressing-indexed-reg64 ~ Exit with an error if the base register is :rbp. dup :rbp != 0branch [ 23 8 * , ] ~ Reg/mem value 4 means to use an SIB byte (at least, with this mode). 5 roll 0 6 roll 4 modrm 4 unroll reg64 3unroll reg64 3unroll scalefield 3unroll sib exit ." Base parameter to addressing-indexed-reg64 is :rbp." 1 sys-exit ; ~ (output point, reg/op field value, ~ scale factor, index register, base field register, ~ displacement value -- output point) : addressing-disp8-indexed-reg64 ~ This mode can do :rbp fine, so no need to check for that. ~ Reg/mem value 4 means to use an SIB byte (at least, with this mode). 6 roll 1 7 roll 4 modrm 5 unroll 5 unroll reg64 3unroll reg64 3unroll scalefield 3unroll sib swap pack8 ; ~ Easy instructions ~ ~~~~~~~~~~~~~~~~~ ~ ~ It's not worth pretending there's a coherent category behind this ~ grouping. These are the ones that were easy to deal with. ~ (output point -- output point) : cld 0xFC pack8 ; : std 0xFD pack8 ; : syscall 0x0F pack8 0x05 pack8 ; : hlt 0xF4 pack8 ; ~ (output point, source register -- output point) : push-reg64 reg64 0x50 opcodereg ; ~ (output point, target register -- output point) : pop-reg64 reg64 0x58 opcodereg ; ~ (output point, immediate value -- output point) : push-imm32-extended64 swap 0x68 pack8 swap pack32 ; ~ (output point, source register, source displacement value, target register ~ -- output point) : lea-reg64-disp8-reg64 4 roll rex-w 0x8D pack8 4 unroll reg64 3unroll addressing-disp8-reg64 ; ~ (output point, source register, source displacement value, target register ~ -- output point) : lea-reg64-disp32-reg64 4 roll rex-w 0x8D pack8 4 unroll reg64 3unroll addressing-disp32-reg64 ; ~ (output point, ~ source base register, source index register, source index scale factor, ~ target register -- output point) : lea-reg64-indexed-reg64 5 roll rex-w 0x8D pack8 5 unroll reg64 4 unroll 3unroll swap addressing-indexed-reg64 ; ~ (output point, ~ source base register, source index register, source index scale factor, ~ source displacement value, ~ target register -- output point) : lea-reg64-disp8-indexed-reg64 6 roll rex-w 0x8D pack8 6 unroll reg64 5 unroll 3 roll 4 roll 3 roll addressing-disp8-indexed-reg64 ; ~ Move instructions ~ ~~~~~~~~~~~~~~~~~ ~ ~ These are, like, MOST of what we care about, so they get their own ~ section. Although it's very much the case that almost every two-operand ~ instruction offers this many distinct modes, we don't care about most of ~ those and don't yet implement them. We do care about all the modes for move ~ instructions. ~ ~ Someday perhaps we'll have extra-high-level features which generate all ~ the distinct versions of each instruction in a concise way, but that is not ~ this day. ~ (output point, immediate value, register -- output point) : mov-reg64-imm32 3roll rex-w 0xC7 pack8 swap 0 swap addressing-reg64 swap pack32 ; : mov-reg64-imm64 3roll rex-w swap reg64 0xB8 opcodereg swap pack64 ; : mov-extrareg64-imm64 ~ Note the use of the B rex bit here; this instruction puts the register ~ number in the opcode field, so it uses Table 3-1. 3roll rex-wb swap extrareg64 0xB8 opcodereg swap pack64 ; ~ (output point, source register, target register -- output point) : mov-reg64-reg64 3roll rex-w 0x89 pack8 3unroll swap reg64 swap addressing-reg64 ; : mov-indirect-reg64-reg64 3roll rex-w 0x89 pack8 3unroll swap reg64 swap addressing-indirect-reg64 ; ~ (output point, source register, target register, target displacement value ~ -- output point) : mov-disp8-reg64-reg64 4 roll rex-w 0x89 pack8 4 unroll 3roll reg64 3unroll addressing-disp8-reg64 ; ~ (output point, source register, target register -- output point) : mov-reg64-indirect-reg64 3roll rex-w 0x8B pack8 3unroll reg64 swap addressing-indirect-reg64 ; ~ (output point, source register, source displacement value, target register ~ -- output point) : mov-reg64-disp8-reg64 4 roll rex-w 0x8B pack8 4 unroll reg64 3unroll addressing-disp8-reg64 ; : mov-reg64-disp32-reg64 4 roll rex-w 0x89 pack8 4 unroll 3roll reg64 swap 3roll addressing-disp32-reg64 ; ~ (output point, ~ source base register, source index register, source index scale factor, ~ target register -- output point) : mov-reg64-indexed-reg64 5 roll rex-w 0x8B pack8 5 unroll reg64 4 unroll 3unroll swap addressing-indexed-reg64 ; ~ (output point, source register, ~ target base register, target index register, target index scale factor ~ -- output point) : mov-indexed-reg64-reg64 5 roll rex-w 0x89 pack8 5 unroll 4 roll reg64 4 unroll 3unroll swap addressing-indexed-reg64 ; ~ (output point, source register, target register -- output point) : mov-indirect-reg64-reg32 3roll 0x89 pack8 3unroll swap reg32 swap addressing-indirect-reg64 ; ~ (output point, source regisgter, target register, target displacement value ~ -- output point) : mov-disp8-reg64-reg32 4 roll 0x89 pack8 4 unroll 3roll reg32 3unroll addressing-disp8-reg64 ; ~ (output point, source register, target register -- output point) : mov-reg32-indirect-reg64 3roll 0x8B pack8 3unroll reg32 swap addressing-indirect-reg64 ; ~ (output point, source register, source displacement value, target register ~ -- output point) : mov-reg32-disp8-reg64 4 roll 0x8B pack8 4 unroll reg32 3unroll addressing-disp8-reg64 ; ~ (output point, source register, target register -- output point) : mov-indirect-reg64-reg16 3roll 0x66 pack8 0x89 pack8 3unroll swap reg16 swap addressing-indirect-reg64 ; ~ (output point, source register, target register, target displacement value ~ -- output point) : mov-disp8-reg64-reg16 4 roll 0x66 pack8 0x89 pack8 4 unroll 3roll reg16 3unroll addressing-disp8-reg64 ; ~ (output point, source register, target register -- output point) : mov-reg16-indirect-reg64 3roll 0x66 pack8 0x8B pack8 3unroll reg16 swap addressing-indirect-reg64 ; ~ (output point, source register, target displacement value, target register ~ -- output point) : mov-reg16-disp8-reg64 4 roll 0x66 pack8 0x8B pack8 4 unroll reg16 3unroll addressing-disp8-reg64 ; ~ (output point, source register, target register -- output point) : mov-indirect-reg64-reg8 3roll 0x88 pack8 3unroll swap reg8 swap addressing-indirect-reg64 ; ~ (output point, source register, target register, target displacement value ~ -- output point) : mov-disp8-reg64-reg8 4 roll 0x88 pack8 4 unroll 3roll reg8 3unroll addressing-disp8-reg64 ; ~ (output point, source register, target register -- output point) : mov-reg8-indirect-reg64 3roll 0x8A pack8 3unroll reg8 swap addressing-indirect-reg64 ; ~ (output point, source register, source displacement value, target register ~ -- output point) : mov-reg8-disp8-reg64 4 roll pack8 0x8A pack8 4 unroll reg8 3unroll addressing-disp8-reg64 ; ~ (output point, source register, target register -- output point) : mov-reg8-reg8 3roll 0x88 pack8 3unroll swap reg8 swap addressing-reg8 ; ~ String instructions ~ ~~~~~~~~~~~~~~~~~~~ ~ ~ These are in their own section because there's an awful lot of ~ combinations, and fortunately they are very uniform in structure. ~ ~ What makes these useful is that they take their parameters from certain ~ fixed registers, which are chosen such that the operations chain into each ~ other well. Thus you can use them to build various block-memory and string ~ operations, and even if you need unusual forms of loop unrolling or ~ alignment tweaking, the code will end up uniform in structure. On modern ~ processors, this is even the high-performance approach, due to highly ~ optimized microcode, though these operations were inefficient when they ~ were first invented. ~ ~ We break with the Intel mnemonics, which follow the pattern ~ movsb/movsw/movsd/movsq, because this would otherwise be the only place we ~ use the b/w/d/q thing instead of 8/16/32/64. Tradition and pronounceability ~ are both nice things, but approachability to newcomers is important, too. ~ ~ Some of these are repeatable; whether you view the repeatable variants ~ as different instructions is up to you. At any rate the machine code ~ representation of the repeatable variants is the same as for the regular ~ variants with an extra prefix, so we define them together. ~ ~ This is a proper superset of the flatassembler implementations of string ~ instructions. The wisdom of that is questionable, but at least it's noted ~ here... ~ (output point -- output point) : movs8 0xA4 pack8 ; : movs16 0x66 pack8 0xA5 pack8 ; : movs32 0xA5 pack8 ; : movs64 rex-w 0xA5 pack8 ; : rep-movs8 0xF3 pack8 0xA4 pack8 ; : rep-movs16 0xF3 pack8 0x66 pack8 0xA5 pack8 ; : rep-movs32 0xF3 pack8 0xA5 pack8 ; : rep-movs64 0xF3 pack8 rex-w 0xA5 pack8 ; ~ (output point -- output point) : lods8 0xAC pack8 ; : lods16 0x66 pack8 0xAd pack8 ; : lods32 0xAD pack8 ; : lods64 rex-w 0xAD pack8 ; : rep-lods8 0xF3 pack8 0xAC pack8 ; : rep-lods16 0xF3 pack8 0x66 pack8 0xAD pack8 ; : rep-lods32 0xF3 pack8 0xAD pack8 ; : rep-lods64 0xF3 pack8 rex-w 0xAD pack8 ; ~ (output point -- output point) : stos8 0xAA pack8 ; : stos16 0x66 pack8 0xAB pack8 ; : stos32 0xAB pack8 ; : stos64 rex-w 0xAB pack8 ; : rep-stos8 0xF3 pack8 0xAA pack8 ; : rep-stos16 0xF3 pack8 0x66 pack8 0xAB pack8 ; : rep-stos32 0xF3 pack8 0xAB pack8 ; : rep-stos64 0xF3 pack8 rex-w 0xAB pack8 ; ~ (output point -- output point) : cmps8 0xA6 pack8 ; : cmps16 0x66 pack8 0xA7 pack8 ; : cmps32 0xA7 pack8 ; : cmps64 rex-w 0xA7 pack8 ; : repz-cmps8 0xF3 pack8 0xA6 pack8 ; : repz-cmps16 0xF3 pack8 0x66 pack8 0xA7 pack8 ; : repz-cmps32 0xF3 pack8 0xA7 pack8 ; : repz-cmps64 0xF3 pack8 rex-w 0xA7 pack8 ; : repnz-cmps8 0xF2 pack8 0xA6 pack8 ; : repnz-cmps16 0xF2 pack8 0x66 pack8 0xA7 pack8 ; : repnz-cmps32 0xF2 pack8 0xA7 pack8 ; : repnz-cmps64 0xF2 pack8 rex-w 0xA7 pack8 ; ~ (output point -- output point) : scas8 0xA8 pack8 ; : scas16 0x66 pack8 0xAF pack8 ; : scas32 0xAF pack8 ; : scas64 rex-w 0xAF pack8 ; : repz-scas8 0xF3 pack8 0xAE pack8 ; : repz-scas16 0xF3 pack8 0x66 pack8 0xAF pack8 ; : repz-scas32 0xF3 pack8 0xAF pack8 ; : repz-scas64 0xF3 pack8 rex-w 0xAF pack8 ; : repnz-scas8 0xF2 pack8 0xAE pack8 ; : repnz-scas16 0xF2 pack8 0x66 pack8 0xAF pack8 ; : repnz-scas32 0xF2 pack8 0xAF pack8 ; : repnz-scas64 0xF2 pack8 rex-w 0xAF pack8 ; ~ Arithmetic instructions ~ ~~~~~~~~~~~~~~~~~~~~~~~ ~ (output point, source register, target register -- output point) : add-reg64-reg64 3roll rex-w 0x01 pack8 3unroll swap reg64 swap addressing-reg64 ; ~ (output point, source register, target register -- output point) : add-indirect-reg64-reg64 3roll rex-w 0x01 pack8 3unroll swap reg64 swap addressing-indirect-reg64 ; ~ (output point, source register, target register -- output point) : add-reg64-indirect-reg64 3roll rex-w 0x03 pack8 3unroll reg64 swap addressing-indirect-reg64 ; ~ (output point, source register, target register -- output point) : add-reg64-imm8 3roll rex-w 0x83 pack8 swap 0 swap addressing-reg64 swap pack8 ; ~ (output point, source register, target register -- output point) : sub-reg64-reg64 3roll rex-w 0x2B pack8 3unroll reg64 swap addressing-reg64 ; ~ (output point, source register, target register -- output point) : sub-indirect-reg64-reg64 3roll rex-w 0x2B pack8 3unroll swap reg64 swap addressing-indirect-reg64 ; ~ (output point, source register, target register -- output point) : sub-reg64-imm8 3roll rex-w 0x83 pack8 swap 5 swap addressing-reg64 swap pack8 ; ~ (output point, source register, target register -- output point) : sbb-reg64-imm8 3roll rex-w 0x83 pack8 swap 3 swap addressing-reg64 swap pack8 ; ~ The target register is always rax. ~ ~ (output point, source register -- output point) : mul-reg64 swap rex-w 0xF7 pack8 swap 4 swap addressing-reg64 ; ~ The dividend is 128 bits, and is formed from rdx as the high half and rax ~ as the low half. The divisor is a specified register. The quotient is ~ returned in rax, truncated towards zero. The remainder is in rdx. This ~ entire process is unsigned. ~ ~ The official mnemonic for this is "div", but divmod is what it does. ~ ~ (output point, divisor register -- output point) : divmod-reg64 swap rex-w 0xF7 pack8 swap 6 swap addressing-reg64 ; ~ Same as divmod, but signed. ~ ~ (output point, divisor register -- output point) : idivmod-reg64 swap rex-w 0xF7 pack8 swap 7 swap addressing-reg64 ; ~ (output point, target register -- output point) : inc-reg64 swap rex-w 0xFF pack8 swap 0 swap addressing-reg64 ; ~ (output point, target register -- output point) : dec-reg64 swap rex-w 0xFF pack8 swap 1 swap addressing-reg64 ; ~ (output point, source register, target register -- output point) : and-reg64-reg64 3roll rex-w 0x23 pack8 3unroll reg64 swap addressing-reg64 ; ~ (output point, source value, target register -- output point) : and-reg68-imm8 3roll rex-w 0x83 pack8 swap 4 swap addressing-reg64 swap pack8 ; ~ (output point, source register, target register -- output point) : or-reg64-reg64 3roll rex-w 0x0B pack8 3unroll reg64 swap addressing-reg64 ; ~ (output point, source value, target register -- output point) : or-reg64-imm8 3roll rex-w 0x83 pack8 swap 1 swap addressing-reg64 swap pack8 ; ~ (output point, source register, target register -- output point) : xor-reg64-reg64 3roll rex-w 0x33 pack8 3unroll reg64 swap addressing-reg64 ; ~ (output point, target register -- output point) : not-reg64 swap rex-w 0xF7 pack8 swap 2 swap addressing-reg64 ; ~ Control flow instructions ~ ~~~~~~~~~~~~~~~~~~~~~~~~~ ~ Pretend to subtract right from left, and set the flags the same way as if ~ we actually had. ~ ~ (output point, left register, right register -- output point) : cmp-reg64-reg64 3roll rex-w 0x3B pack8 3unroll reg64 swap addressing-reg64 ; ~ Pretend to xor left with right, and set the flags the same way as if we ~ actually had. ~ ~ The names of the condition codes can be a little confusing when using them ~ after "test", because they're really premised on the idea that you did ~ "cmp". ~ ~ (output point, left register, right register -- output point) : test-reg64-reg64 3roll rex-w 0x85 pack8 3unroll swap reg64 swap addressing-reg64 ; ~ (output point, condition code, target register -- output point) : set-reg8-cc 3roll 0x0F pack8 3roll condition-code 0x90 opcodecc swap reg8 3 0 3roll modrm ; ~ (output point, address offset value, condition code -- output point) : jmp-cc-rel-imm8 3roll swap condition-code 0x70 opcodecc swap pack8 ; ~ (output point, address offset value, condition code -- output point) : jmp-cc-rel-imm32 3unroll 0x0F pack8 swap condition-code 0x70 opcodecc swap pack32 ; ~ (output point, register -- output point) : jmp-abs-indirect-reg64 swap 0xFF pack8 swap 4 swap addressing-indirect-reg64 ; ~ (output point, address offset value -- output point) : jmp-rel-imm8 swap 0xEB pack8 swap pack8 ; ~ (output point, address offset value -- output point) : jmp-rel-imm32 swap 0xE9 pack8 swap pack32 ;