summary refs log tree commit diff
diff options
context:
space:
mode:
-rw-r--r--evoke.e542
-rw-r--r--execution.e539
-rw-r--r--interpret.e236
3 files changed, 742 insertions, 575 deletions
diff --git a/evoke.e b/evoke.e
index add1db9..529a9a4 100644
--- a/evoke.e
+++ b/evoke.e
@@ -1,544 +1,4 @@
-~ cat labels.e elf.e evoke.e | ./quine > evoke && chmod 755 evoke && ./evoke
-
-~ ~~~~~~~~~~~~~~~~~~~~~
-~ ~~ Execution model ~~
-~ ~~~~~~~~~~~~~~~~~~~~~
-~
-~    We use Forth-style dual stacks, one for values and one for control. We
-~  use rsp for values, just like C does. We use rbp for the control stack,
-~  which is a special Forth-y stack: These are pointers into the bodies of
-~  Forth words, not return addresses.
-~
-~    The choice of rsp and rbp for the stack pointers imitates Jonesforth;
-~  I'm hopeful that it gives us convenient addressing modes, and will report
-~  back about that when I feel that I understand the implications.
-~
-~    In Forth, everything is a "word", including mutable variables.
-~  Conceptually, a word is a unit of execution, which may be implemented
-~  either in machine code or as an array of pointer to other words.
-~
-~    This polymorphism is implemented by having each word's contents begin
-~  with a "codeword", which is a pointer to machine code that "interprets"
-~  the rest of the contents. In the case of words implemented in machine
-~  code, the codeword points directly to that code, which is normally right
-~  next to it.
-~
-~    Variables, to Forth, are simply one more thing that can be executed; the
-~  effect of executing a variable is to push its address onto the value
-~  stack.
-~
-~    We adopt this model of words, codewords, and variables-as-words. It's
-~  really nice how it doesn't force anything else on us, not even a heap,
-~  though we do end up using a heap.
-~
-~    We specifically implement a version of calling and returning that Forth
-~  calls indirect threaded code: The control stack is a stack of pointers
-~  into the middle of interpreted words. The interpreter snippet, called
-~  docol, implements calling.  Each word is responsible for making sure
-~  returning works properly. Interpreted words accomplish this by ending with
-~  the word "exit", while machine-code words accomplish it by ending with a
-~  verbatim snippet called "next".
-~
-~    Conceptually, "next" returns, but more specifically it accomplishes this
-~  by doing the caller's next dispatch for it; thus control never actually
-~  goes back to the caller's interpreter after initial setup. For performance
-~  reasons, "next" is always inlined, so we define it as a macro.
-~
-~    The docol routine is just ordinary code, not a macro. It's defined later
-~  in this file, as a label.
-~
-~    Notionally, we could consider not having a dictionary, and not giving
-~  our words names. However, it feels silly to stop when we're so close to
-~  being a full Forth, and using names for things solves a bootstrapping
-~  problem related to heap management - see the write-up of _start about how
-~  the heap is created, below. So, we add an additional header before the
-~  codeword for this purpose.
-~
-~    The Forth dictionary is usually a linked list of every word that has
-~  ever been defined, with the newest at the head; the names of words are
-~  stored in string fields, often right next to the link pointer. We adopt
-~  this model, with the field sizes and order shown in the quick reference
-~  below. We break with Forth tradition in one way: Rather than having a
-~  length field, we use a null-terminated string. Thus, there's no length
-~  limit on names. This necessitates breaking out the flags (to be explained
-~  later) into their own byte, rather than taking bits from the length field
-~  for them.
-~
-~    There's an important performance consideration: Executable words
-~  reference each other by pointers to their respective codewords. However,
-~  dictionary entries reference each other by pointers to their respective
-~  link fields. Traversing from the link field to the codeword is easy,
-~  though it's a non-constant-time operation: Just walk the string. In order
-~  to make Forth words easy to "decompile", it would be nice to also have a
-~  way to traverse backwards. We solve this by making the name field be
-~  null-terminated at both ends. Fun, yeah?
-~
-~
-~
-~
-~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-~   Quick Reference
-~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-~
-~  The layout of an interpreted word:
-~
-~  (overall start)
-~             0x00 - 0x07                     Link (to next-oldest word)
-~             0x08 - 0x08  HM00000I           Flags
-~                                                 H - hidden
-~                                                 M - metadata
-~                                                 I - immediate
-~                                                 all other bits reserved
-~  (name start)
-~             0x09 - 0x09                     Null byte (terminates name)
-~             0x0a - name-end - 1             Name, as UTF-8
-~         name-end - name-end                 Null byte (terminates name)
-~  (padding start)
-~     name-end + 1 - codeword-start - 1       Zero-pad to 8-byte boundary
-~     (it's possible this will be zero bytes long)
-~  (codeword start)
-~       ... + 0x00 - ... + 0x08               Codeword (ie. address of docol)
-~           (8-byte chunks)                   Addresses of other words
-~                  - ... (end)                Address of "exit" word
-~
-~  The layout of a machine-code word is different only from the codeword on:
-~
-~       ... + 0x00 - ... + 0x08               Addresss of next byte
-~       ... + 0x08 - ????                     Arbitrary machine code
-~                  - ... (end)                Inlined implementation of next
-~
-~  Also, words always start at 8-byte boundaries.
-~
-~
-~  REGISTER usage conventions:
-~
-~  * rsi is the "instruction pointer" for the "interpreter".
-~      That is, it points to some word-pointer inside an array of
-~    word-pointers inside the content of the word they're part of. It always
-~    points to the next word that should be executed, whose execution hasn't
-~    begun yet.
-~
-~  * rbp points to the top of the control stack
-~      These are former values of rsi, to eventually be returned to, from
-~    successively older callers as you look further up the stack. The stack
-~    grows downwards in memory. Since values are kept separately, the only
-~    thing on the control stack is return addresses, one per layer of call.
-~
-~  * rsp points to the top of the value stack
-~      The value stack has no specific format, but it grows downwards in
-~    memory. In particular there's no concept of stack frames, because items
-~    on the stack don't belong to any particular word; the value stack in
-~    Forth is in part a mechanism for passing values between words.
-~
-~  Additionally, immediately after beginning execution of a word:
-~
-~  * rax points to the address of the codeword being executed
-~      The value of rax is purely for the callee's benefit, and does not need
-~    to be preserved.
-~
-~    Other registers are purely discretionary, and are not preserved across
-~  calls.
-~
-~
-~  FLAG usage:
-~
-~  * DF should be 0
-~    We use lodsq extensively and that makes it increment rsi after using it.
-~
-~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-
-~ Macro next
-~ ~~~~~~~~~~
-~
-~   Include this inline at the end of a word implemented in machine-code.
-~ Conceptually, it returns. What it actually does is do the next thing the
-~ caller would do, which is call the next word from the caller's array of
-~ word pointers.
-~
-~   This is a widespread technique in Forth implementation, referred to as
-~ indirect threaded code. It's "threaded" in the sense that each word takes
-~ responsibility for finishing up by following the notional thread through the
-~ metaphorical labyrinth to figure out the next word that its caller wants to
-~ run after it. In other words, control never directly returns to the parent,
-~ it proceeds directly to the sibling.
-~
-~ Registers in:
-~
-~ * rsi points to the address of the word to execute
-~
-~ Registers out:
-~
-~ * rax points to the codeword in the contents of the word that was executed
-~ * rsi points to the next word-address after this one
-~
-~ Flags
-~ * DF = 0 is required
-~
-~ (base address -- new base address)
-: next
-  ~ Copy the next word's address from *rsi into rax. Increment rsi (as per the
-  ~ DF flag).
-  lods64
-
-  ~ Load the codeword from the word's contents, and jump to the interpreter it
-  ~ points to.
-  :rax jmp-abs-indirect-reg64 ;
-
-
-~ Macro beforenext
-~ ~~~~~~~~~~~~~~~~
-~
-~   Sometimes we want to transfer control from a word implemented in
-~ machine-code to another word, without coming back after, as if we were
-~ simply jumping to it. This is an innovation of ours; Jonesforth doesn't do
-~ it. It is similar to the tail-call optimization that many Lisp dialects
-~ have.
-~
-~   This implementation will work regardless of how the receiving word is
-~ implemented. It impersonates the "next" snippet, setting up rax to point
-~ to the codeword then jumping to the interpreter. Since it doesn't change
-~ the control stack or rsi, when the receiving word eventually invokes
-~ "next"; it will pick up in the same place as if this sending word had done
-~ it.
-~
-~   Thus, notionally we are doing just this one transfer of control before
-~ eventually getting around to inlining "next". Hence the name.
-~
-~ (target address, base address -- new base address)
-: beforenext
-  ~ Do a permanent transfer of control by setting rax and invoking the
-  ~ codeword. Of course, we could jump to docol ourselves but this will work
-  ~ regardless of what the receiving codeword is.
-  :rax mov-reg64-imm64
-  :rax jmp-abs-indirect-reg64 ;
-
-
-~ Macros pushcontrol
-~        popcontrol
-~ ~~~~~~~~~~~~~~~~~~
-~
-~   Include these inline to push an address onto the control stack, or pop
-~ one off of it. You will recall the control stack is kept in rbp. The
-~ parameter is given in a user-specified register.
-~
-~   Jonesforth's analogous macros are called PUSHRSP and POPRSP but I think
-~ that's super confusing, since rsp is also the name of a register, but a
-~ different one. I guess it was less confusing in 32-bit, since esp doesn't
-~ start with an "r". Anyway, this has to be named something that
-~ distinguishes it from Intel's PUSH and POP opcodes, so...
-~
-~   "Load effective address" is just a cute way to do arithmetic on a
-~ register, here. To push or pop we decrement or increment rbp by 8. To
-~ actually interact with the space in the stack, we indirect through rbp.
-~
-~ Registers in and out:
-~
-~ * rbp points to the top of the control stack.
-~
-~ (source register, base address -- new base address)
-: pushcontrol
-  swap :rbp -8 :rbp lea-reg64-disp8-reg64
-  swap :rbp 0 mov-disp8-reg64-reg64 ;
-
-~ (target register, base address -- new base address)
-: popcontrol
-  :rbp 0 3roll mov-reg64-disp8-reg64
-  :rbp 8 :rbp lea-reg64-disp8-reg64 ;
-
-~ Constants
-~ ~~~~~~~~~
-~
-~ These are used in cold-start, just below.
-
-: heap-requested-address 0x0000001000000000 ; ~ (very arbitrary)
-: heap-size              0x0000000001000000 ; ~ 16 MiB
-: control-stack-size                0x10000 ; ~ 64 KiB
-
-
-~ Routine cold-start
-~ ~~~~~~~~~~~~~~~~~~
-~
-~   This is the entry point of the whole program, the very first code we
-~ actually execute. Linkers traditionally call this _start, but let's live a
-~ little. Anyway, the ELF header points to it and exec() jumps to it; from the
-~ kernel's perspective, there's no such thing as names. Also, though this code
-~ could be anywhere in the code part of the output, in order to make the
-~ hexdump pretty we put it at the start.
-~
-~   The kernel gives us most registers zeroed, and rsp pointing to the
-~ command-line stuff (argc, argv, envp), which is at an ASLR'd address with
-~ some stack space allocated for us, despite the fact we didn't request any.
-~ It also gives us all the flags clear except IF, but we don't rely on that.
-~ Lastly, of course, it loads our code segment and sets the instruction
-~ pointer where we asked; we don't need to check what those addresses are,
-~ because they're not randomized.
-~
-~   This routine is really only responsible for one-time initialization.
-~
-~ Registers in:
-~
-~ * rsp points to the logical top  of the value stack
-~     The kernel sets this up for us, and we need to save it somewhere so
-~   Forth can use it.
-~
-~ Registers out:
-~
-~ * rsi points within "quit"
-~     Quit is the word that's Forth's closest equivalent to main().
-~ * rsp points to the top of the value stack
-~
-~   Notably, rbp is still uninitialialized after _start.
-~
-~ Stack in:
-~
-~ * argc, argv, envp in the usual Unix way
-~     We ignore them, though.
-~
-~ Stack out:
-~
-~ * The value of "heap", as a pointer
-~     The meaning of this will be explained below.
-~
-~ Registers within:
-~
-~ * rdi points to the base the heap was allocated at, once it exists
-~     This is the same value that "heap" will hold, once we reach a point
-~   where we have variables. Of course, variables are stored on the heap,
-~   hence this temporary measure.
-~
-~   We also take this opportunity to define soeme memory layout parameters
-~ that this routine will be responsible for doing something with:
-~
-~ (output memory start, current output point
-~  -- output memory start, current output point)
-: cold-start
-  current-offset L' cold-start set-label
-  cld                                      ~ clear the DF flag
-
-  ~   Prepare the heap.
-  ~
-  ~   We could ask for a data segment in the program header, but where's the
-  ~ fun in that? Instead, we call mmap().
-  ~
-  ~   If we wanted the kernel to do ASLR for us, passing address zero would
-  ~ cause it to pick somewhere at random, but instead we choose our own
-  ~ location. It's still not guaranteed to be where we ask for, so we still
-  ~ do the work to record where it wound up. We could pass the "fixed" flag
-  ~ and the kernel would trust us, but this gives us more options for
-  ~ interoperating with other runtimes.
-  ~
-  9 :rax mov-reg64-imm64                      ~ mmap()
-  heap-requested-address :rdi mov-reg64-imm64 ~ address (very arbitrary)
-  heap-size :rsi mov-reg64-imm64              ~ size (one meg)
-  0x07 :rdx mov-reg64-imm64                   ~ protection (read+write+exec)
-  0x22 :r10 mov-extrareg64-imm64              ~ flags (private+anonymous)
-  0 :r8 mov-extrareg64-imm64                  ~ file descriptor (ignored)
-  0 :r9 mov-extrareg64-imm64                  ~ offset (ignored)
-  syscall
-
-  ~   The return value of the system call is in rax, we'll use it in a sec.
-  ~ We need to save this somewhere in case we ever want to munmap() it;
-  ~ there's no widely-used name for it so we have to make one up. S0 and r0
-  ~ are widely-used names for the physical tops (logical bottoms) of the
-  ~ value and control stacks, respectively, and we will eventually set those
-  ~ up as well, so we should keep those names in mind. The control stack
-  ~ lives within the heap, while the value stack is its own segment. This
-  ~ value, though, is the physical bottom of the segment, meaning that it
-  ~ stays the same even as we allocate and deallocate things within it. This
-  ~ is unlike the two stack pointers, so we give it a name that doesn't
-  ~ suggest similarity: "heap".
-  ~
-  ~   Once Forth is fully set up, its internal variables will be accessed
-  ~ through variable-words like any other Forth data, including "heap". To
-  ~ get to that point, though, we need to be able to hold onto variable data
-  ~ between now and then. In fact, if we don't have at least one of "heap"
-  ~ and "here" (its counterpart which points to the logical top end), all
-  ~ our efforts to hold onto anything seem a bit doomed.
-  ~
-  ~   So, we temporarily dedicate rdi to "heap" - only within this routine -
-  ~ and store everything else in ways that let us find things by reference
-  ~ to it. We choose rdi because it works with the indexing modes we care
-  ~ about, and its name suggests its function.
-  ~
-  ~   The strategy Jonesforth uses is not applicable to us; Jonesforth
-  ~ takes advantage of the linker to let its code segment refer to specific,
-  ~ pre-allocated objects in the data segment. We are our own linker, and we
-  ~ don't care to have a data segment. Hence, this approach.
-  ~
-  ~   Keying things off "heap" is the fundamental decision, but to make sure
-  ~ our variables are accessible both during early bootstrapping, and later,
-  ~ we also have to be thoughtful about data structures. More on that in a
-  ~ moment.
-  ~
-  :rax :rdi mov-reg64-reg64
-
-  ~   We also initialize rbp. We could hold off and let "quit" do this, but
-  ~ doing it now is the easiest way to initialize the r0 variable, since
-  ~ there's no instruction that moves a 64-bit immediate to memory.
-  ~
-  ~   This is the moment at which we decide where the control stack starts!
-  ~ Fun, right? "Allocation" is just a fancy word for picking where we want
-  ~ something, then being consistent about it - like placing furniture in
-  ~ your home. See below for a little more thought about why here in
-  ~ particular.
-  ~
-  :rdi control-stack-size :rbp lea-reg64-disp32-reg64
-
-  ~   Now we save some stuff onto the heap. These are the locations that
-  ~ will eventually be the backing stores of the Forth variables, but we
-  ~ don't create the word headers yet, since there's no requirement that
-  ~ they be next to the backing stores. We'll do that later, once we have
-  ~ word-writing infrastructure in place. For now, we just use their offsets
-  ~ relative to the physical bottom of the heap, which are fixed.
-  ~
-  ~   These will be the permanent homes of these values, though we have
-  ~ copies of them elsewhere while we're still in this routine.
-  ~
-  :rdi control-stack-size 0x00 + :rdi mov-reg64-disp32-reg64  ~ heap
-  :rsp control-stack-size 0x08 + :rdi mov-reg64-disp32-reg64  ~ s0
-  :rbp control-stack-size 0x10 + :rdi mov-reg64-disp32-reg64  ~ r0
-  L@' final-word-name :rax mov-reg64-imm64
-  :rax control-stack-size 0x18 + :rdi mov-reg64-disp32-reg64  ~ latest
-  :rdi control-stack-size 0x28 + :rax lea-reg64-disp32-reg64
-  :rax control-stack-size 0x20 + :rdi mov-reg64-disp32-reg64  ~ here
-  ~
-  ~ * "heap" is the physical bottom of the heap
-  ~     The heap grows upwards in memory, so this is also the logical
-  ~   bottom. This comes from the address mmap() just returned to us.
-  ~ * "s0" is the logical bottom of the value stack
-  ~     The value stack grows downwards in memory, so this is the physical
-  ~   top of it. This comes from the stack pointer the kernel initialized us
-  ~   with.
-  ~ * "r0" is the logical bottom of the control stack
-  ~     The control stack also grows downwards, so this is its pysical top
-  ~   as well. We allocate this dedicated space within the heap right here,
-  ~   in this routine, through our choice of where to put things.
-  ~ * "here" is the physical start of the unallocated space in the heap
-  ~     We allocate heap space from bottom to top, by incrementing this
-  ~   value. So, it would also be accurate to say that it points immediately
-  ~   after the physical top of the allocated space. At any rate, the
-  ~   address it points to is the first one that hasn't been used yet.
-  ~ * "latest" is the address of the most-recently-defined word's header
-  ~     Defining new words changes this value.
-  ~
-  ~   s0 and r0 are mostly used when we want to initialize or reinitialize
-  ~ their respective stacks - that is, discard all their contents at once.
-  ~
-  ~   The value of r0 is the same address these variables start at, so
-  ~ you'll want to do a close read of the implementation of pushcontrol
-  ~ and convince yourself that it only ever writes things just below the rbp
-  ~ address it receives, never right on top of it.
-  ~
-  ~   As an aside, by the way, please notice that strictly speaking, r0
-  ~ could be a constant... but it isn't known until runtime, so we might as
-  ~ well make it a variable. That will play nicely with any astonishing
-  ~ memory shenanigans someone might wish to do in the future.
-  ~
-  ~   Notice also that "here" points immediately after itself. This is just
-  ~ a convenience, making it the last one like that so that the concern is
-  ~ dealt with in a single place and is easy to keep up-to-date with code
-  ~ changes.
-  ~
-  ~   A little more detail about why we offset everything by
-  ~ control_stack_size: We're carving out some space at the bottom of the
-  ~ heap - which grows low-to-high - to be the control stack - which grows
-  ~ high-to-low. So the control stack is allocated out of the heap as a
-  ~ fixed-size, one-time thing, and then the variables come immediately
-  ~ after that. We do need to use 32-bit displacement indexing to access
-  ~ them this way, but that's no big deal.
-  ~
-  ~   This is perhaps questionable, they should maybe be separate segments
-  ~ created with separate calls to mmap(), but for now we're not worried
-  ~ about overflow so we use the same allocation for both.
-  ~
-  ~   We'll come back to these variables a bit later and generate the word
-  ~ headers that point at them, but now we're almost ready to switch to
-  ~ proper threaded-execution, so we finish that setup first...
-
-  ~   Push the value of "heap" onto the value stack so that it can be the
-  ~ breadcrumb the threaded code needs to find... the backing store of
-  ~ "heap". Yes, self-reference can be weird like that sometimes. There's
-  ~ nothing stopping "quit" from reading rdi, it just violates the
-  ~ abstraction...
-  :rdi push-reg64
-
-  ~   We are about to set up rsi, we did rbp already, and rsp came to us
-  ~ already set up. That's all that "next" needs, so take it away!
-  L@' warm-start :rsi mov-reg64-imm64
-  next ;
-
-~ Routine warm-start
-~ ~~~~~~~~~~~~~~~~~~
-~
-~   This routine runs as Forth code. That is, indirect threaded execution has
-~ been initialized, and the structure of this routine is an array of codeword
-~ pointers. It was directly jumped to from cold-start. There's nowhere to
-~ return to, so it needs to never return.
-: warm-start
-  ~   While it's not actually a requirement that codeword pointers be
-  ~ word-aligned, it's highly likely that it helps performance. (Whether it
-  ~ does is up to Intel's microcode.)
-  8 packalign
-  current-offset L' warm-start set-label
-
-  ~   Before handing off to us, cold-start pushed a single value onto the
-  ~ stack, a pointer to the beginning of the heap. Now, we load our entire
-  ~ Forth implementation onto that heap, beginning with the minimal set of
-  ~ words needed to define more words. We do this because we need variables as
-  ~ infrastructure so we can eventually have dynamic definitions.
-  ~
-  ~   There's something non-obvious here: words implemented statically as
-  ~ part of the executable image can't contain things that vary at runtime.
-  ~ That means that even if these words tried to implement some sort of
-  ~ dynamic lookup, they would have no way to find the root of whatever
-  ~ dynamic data structure they use. Dynamism needs to be bootstrapped.
-  ~
-  ~   In a more traditional C-style program, static code could look up
-  ~ variables based on fixed addresses that are the same on every run.
-  ~ Failing that, we could dedicate a register to it, though that's a
-  ~ considerable expense. We chose not to do either of those things, because
-  ~ we want the versatility that comes with not being picky about our
-  ~ address space: It allows us to contemplate future improvements such as
-  ~ ASLR, or embedding into other processes that impose their own addressing
-  ~ constraints, or even coexisting with multiple versions of ourselves.
-  ~ That choice does mean we have the hard version of this bootstrapping
-  ~ problem, and copying ourselves to the heap is how we solve it.
-  ~
-  ~   We do have the heap address right now, though that won't last. In case
-  ~ it's unclear why not: keeping it on the stack would require all future
-  ~ references to walk the stack, and somehow know when they've reached the
-  ~ bottom. The stack is a good place to keep things with clearly delimited
-  ~ lifetimes and visibility, but when we want something to live for our
-  ~ entire program and be easy to find from any code within it, we need to
-  ~ do something else. Anyway, since we have the address, we can use it for
-  ~ the next little bit of setup.
-  ~
-  ~   The first few words we define are our variables, which hardcode the
-  ~ addresses they will return - but since we're doing this at runtime,
-  ~ "hardcoding" can reflect where our heap is. This is the fundamental
-  ~ trick that makes the heap usable.
-  ~
-  ~   One more thing to notice: We already allocated the backing stores of
-  ~ these variables, and populated their initial values, in _start. The
-  ~ words we're defining return those same addresses for the same backing
-  ~ stores. So, we have continuity: Stuff defined in terms of the
-  ~ variable-words we're defining now will interoperate with the stuff that
-  ~ we define in the "early" way, which includes those very words. Both the
-  ~ early code and the later code are dealing with the same data structures,
-  ~ they're just using a different technique to find them.
-  ~
-  ~   This is the only hardcoding we need to do; by building on top of it,
-  ~ we will soon reach a point where the rest of the system can be defined
-  ~ within itself.
-  ~ TODO These need to, like, exist first. Also they need to be referenced
-  ~ as labels.
-  ~ dq early_heap, litstring, "heap", early_variable
-  ~ dq early_s0, litstring, "s0", early_variable
-  ~ dq early_r0, litstring, "r0", early_variable
-  ~ dq early_latest, litstring, "latest", early_variable
-  ~ dq early_here, litstring, "here", early_variable
-  ;
+~ cat labels.e elf.e execution.e evoke.e | ./quine > evoke && chmod 755 evoke && ./evoke
 
 ~ (output memory start, current output point
 ~  -- output memory start, current output point)
diff --git a/execution.e b/execution.e
new file mode 100644
index 0000000..f6e5978
--- /dev/null
+++ b/execution.e
@@ -0,0 +1,539 @@
+~ ~~~~~~~~~~~~~~~~~~~~~
+~ ~~ Execution model ~~
+~ ~~~~~~~~~~~~~~~~~~~~~
+~
+~    We use Forth-style dual stacks, one for values and one for control. We
+~  use rsp for values, just like C does. We use rbp for the control stack,
+~  which is a special Forth-y stack: These are pointers into the bodies of
+~  Forth words, not return addresses.
+~
+~    The choice of rsp and rbp for the stack pointers imitates Jonesforth;
+~  I'm hopeful that it gives us convenient addressing modes, and will report
+~  back about that when I feel that I understand the implications.
+~
+~    In Forth, everything is a "word", including mutable variables.
+~  Conceptually, a word is a unit of execution, which may be implemented
+~  either in machine code or as an array of pointer to other words.
+~
+~    This polymorphism is implemented by having each word's contents begin
+~  with a "codeword", which is a pointer to machine code that "interprets"
+~  the rest of the contents. In the case of words implemented in machine
+~  code, the codeword points directly to that code, which is normally right
+~  next to it.
+~
+~    Variables, to Forth, are simply one more thing that can be executed; the
+~  effect of executing a variable is to push its address onto the value
+~  stack.
+~
+~    We adopt this model of words, codewords, and variables-as-words. It's
+~  really nice how it doesn't force anything else on us, not even a heap,
+~  though we do end up using a heap.
+~
+~    We specifically implement a version of calling and returning that Forth
+~  calls indirect threaded code: The control stack is a stack of pointers
+~  into the middle of interpreted words. The interpreter snippet, called
+~  docol, implements calling.  Each word is responsible for making sure
+~  returning works properly. Interpreted words accomplish this by ending with
+~  the word "exit", while machine-code words accomplish it by ending with a
+~  verbatim snippet called "next".
+~
+~    Conceptually, "next" returns, but more specifically it accomplishes this
+~  by doing the caller's next dispatch for it; thus control never actually
+~  goes back to the caller's interpreter after initial setup. For performance
+~  reasons, "next" is always inlined, so we define it as a macro.
+~
+~    The docol routine is just ordinary code, not a macro. It's defined later
+~  in this file, as a label.
+~
+~    Notionally, we could consider not having a dictionary, and not giving
+~  our words names. However, it feels silly to stop when we're so close to
+~  being a full Forth, and using names for things solves a bootstrapping
+~  problem related to heap management - see the write-up of _start about how
+~  the heap is created, below. So, we add an additional header before the
+~  codeword for this purpose.
+~
+~    The Forth dictionary is usually a linked list of every word that has
+~  ever been defined, with the newest at the head; the names of words are
+~  stored in string fields, often right next to the link pointer. We adopt
+~  this model, with the field sizes and order shown in the quick reference
+~  below. We break with Forth tradition in one way: Rather than having a
+~  length field, we use a null-terminated string. Thus, there's no length
+~  limit on names. This necessitates breaking out the flags (to be explained
+~  later) into their own byte, rather than taking bits from the length field
+~  for them.
+~
+~    There's an important performance consideration: Executable words
+~  reference each other by pointers to their respective codewords. However,
+~  dictionary entries reference each other by pointers to their respective
+~  link fields. Traversing from the link field to the codeword is easy,
+~  though it's a non-constant-time operation: Just walk the string. In order
+~  to make Forth words easy to "decompile", it would be nice to also have a
+~  way to traverse backwards. We solve this by making the name field be
+~  null-terminated at both ends. Fun, yeah?
+~
+~
+~
+~
+~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~   Quick Reference
+~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~
+~  The layout of an interpreted word:
+~
+~  (overall start)
+~             0x00 - 0x07                     Link (to next-oldest word)
+~             0x08 - 0x08  HM00000I           Flags
+~                                                 H - hidden
+~                                                 M - metadata
+~                                                 I - immediate
+~                                                 all other bits reserved
+~  (name start)
+~             0x09 - 0x09                     Null byte (terminates name)
+~             0x0a - name-end - 1             Name, as UTF-8
+~         name-end - name-end                 Null byte (terminates name)
+~  (padding start)
+~     name-end + 1 - codeword-start - 1       Zero-pad to 8-byte boundary
+~     (it's possible this will be zero bytes long)
+~  (codeword start)
+~       ... + 0x00 - ... + 0x08               Codeword (ie. address of docol)
+~           (8-byte chunks)                   Addresses of other words
+~                  - ... (end)                Address of "exit" word
+~
+~  The layout of a machine-code word is different only from the codeword on:
+~
+~       ... + 0x00 - ... + 0x08               Addresss of next byte
+~       ... + 0x08 - ????                     Arbitrary machine code
+~                  - ... (end)                Inlined implementation of next
+~
+~  Also, words always start at 8-byte boundaries.
+~
+~
+~  REGISTER usage conventions:
+~
+~  * rsi is the "instruction pointer" for the "interpreter".
+~      That is, it points to some word-pointer inside an array of
+~    word-pointers inside the content of the word they're part of. It always
+~    points to the next word that should be executed, whose execution hasn't
+~    begun yet.
+~
+~  * rbp points to the top of the control stack
+~      These are former values of rsi, to eventually be returned to, from
+~    successively older callers as you look further up the stack. The stack
+~    grows downwards in memory. Since values are kept separately, the only
+~    thing on the control stack is return addresses, one per layer of call.
+~
+~  * rsp points to the top of the value stack
+~      The value stack has no specific format, but it grows downwards in
+~    memory. In particular there's no concept of stack frames, because items
+~    on the stack don't belong to any particular word; the value stack in
+~    Forth is in part a mechanism for passing values between words.
+~
+~  Additionally, immediately after beginning execution of a word:
+~
+~  * rax points to the address of the codeword being executed
+~      The value of rax is purely for the callee's benefit, and does not need
+~    to be preserved.
+~
+~    Other registers are purely discretionary, and are not preserved across
+~  calls.
+~
+~
+~  FLAG usage:
+~
+~  * DF should be 0
+~    We use lodsq extensively and that makes it increment rsi after using it.
+~
+~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+
+~ Macro next
+~ ~~~~~~~~~~
+~
+~   Include this inline at the end of a word implemented in machine-code.
+~ Conceptually, it returns. What it actually does is do the next thing the
+~ caller would do, which is call the next word from the caller's array of
+~ word pointers.
+~
+~   This is a widespread technique in Forth implementation, referred to as
+~ indirect threaded code. It's "threaded" in the sense that each word takes
+~ responsibility for finishing up by following the notional thread through the
+~ metaphorical labyrinth to figure out the next word that its caller wants to
+~ run after it. In other words, control never directly returns to the parent,
+~ it proceeds directly to the sibling.
+~
+~ Registers in:
+~
+~ * rsi points to the address of the word to execute
+~
+~ Registers out:
+~
+~ * rax points to the codeword in the contents of the word that was executed
+~ * rsi points to the next word-address after this one
+~
+~ Flags
+~ * DF = 0 is required
+~
+~ (base address -- new base address)
+: next
+  ~ Copy the next word's address from *rsi into rax. Increment rsi (as per the
+  ~ DF flag).
+  lods64
+
+  ~ Load the codeword from the word's contents, and jump to the interpreter it
+  ~ points to.
+  :rax jmp-abs-indirect-reg64 ;
+
+
+~ Macro beforenext
+~ ~~~~~~~~~~~~~~~~
+~
+~   Sometimes we want to transfer control from a word implemented in
+~ machine-code to another word, without coming back after, as if we were
+~ simply jumping to it. This is an innovation of ours; Jonesforth doesn't do
+~ it. It is similar to the tail-call optimization that many Lisp dialects
+~ have.
+~
+~   This implementation will work regardless of how the receiving word is
+~ implemented. It impersonates the "next" snippet, setting up rax to point
+~ to the codeword then jumping to the interpreter. Since it doesn't change
+~ the control stack or rsi, when the receiving word eventually invokes
+~ "next"; it will pick up in the same place as if this sending word had done
+~ it.
+~
+~   Thus, notionally we are doing just this one transfer of control before
+~ eventually getting around to inlining "next". Hence the name.
+~
+~ (target address, base address -- new base address)
+: beforenext
+  ~ Do a permanent transfer of control by setting rax and invoking the
+  ~ codeword. Of course, we could jump to docol ourselves but this will work
+  ~ regardless of what the receiving codeword is.
+  :rax mov-reg64-imm64
+  :rax jmp-abs-indirect-reg64 ;
+
+
+~ Macros pushcontrol
+~        popcontrol
+~ ~~~~~~~~~~~~~~~~~~
+~
+~   Include these inline to push an address onto the control stack, or pop
+~ one off of it. You will recall the control stack is kept in rbp. The
+~ parameter is given in a user-specified register.
+~
+~   Jonesforth's analogous macros are called PUSHRSP and POPRSP but I think
+~ that's super confusing, since rsp is also the name of a register, but a
+~ different one. I guess it was less confusing in 32-bit, since esp doesn't
+~ start with an "r". Anyway, this has to be named something that
+~ distinguishes it from Intel's PUSH and POP opcodes, so...
+~
+~   "Load effective address" is just a cute way to do arithmetic on a
+~ register, here. To push or pop we decrement or increment rbp by 8. To
+~ actually interact with the space in the stack, we indirect through rbp.
+~
+~ Registers in and out:
+~
+~ * rbp points to the top of the control stack.
+~
+~ (source register, base address -- new base address)
+: pushcontrol
+  swap :rbp -8 :rbp lea-reg64-disp8-reg64
+  swap :rbp 0 mov-disp8-reg64-reg64 ;
+
+~ (target register, base address -- new base address)
+: popcontrol
+  :rbp 0 3roll mov-reg64-disp8-reg64
+  :rbp 8 :rbp lea-reg64-disp8-reg64 ;
+
+~ Constants
+~ ~~~~~~~~~
+~
+~ These are used in cold-start, just below.
+
+: heap-requested-address 0x0000001000000000 ; ~ (very arbitrary)
+: heap-size              0x0000000001000000 ; ~ 16 MiB
+: control-stack-size                0x10000 ; ~ 64 KiB
+
+
+~ Routine cold-start
+~ ~~~~~~~~~~~~~~~~~~
+~
+~   This is the entry point of the whole program, the very first code we
+~ actually execute. Linkers traditionally call this _start, but let's live a
+~ little. Anyway, the ELF header points to it and exec() jumps to it; from the
+~ kernel's perspective, there's no such thing as names. Also, though this code
+~ could be anywhere in the code part of the output, in order to make the
+~ hexdump pretty we put it at the start.
+~
+~   The kernel gives us most registers zeroed, and rsp pointing to the
+~ command-line stuff (argc, argv, envp), which is at an ASLR'd address with
+~ some stack space allocated for us, despite the fact we didn't request any.
+~ It also gives us all the flags clear except IF, but we don't rely on that.
+~ Lastly, of course, it loads our code segment and sets the instruction
+~ pointer where we asked; we don't need to check what those addresses are,
+~ because they're not randomized.
+~
+~   This routine is really only responsible for one-time initialization.
+~
+~ Registers in:
+~
+~ * rsp points to the logical top  of the value stack
+~     The kernel sets this up for us, and we need to save it somewhere so
+~   Forth can use it.
+~
+~ Registers out:
+~
+~ * rsi points within "quit"
+~     Quit is the word that's Forth's closest equivalent to main().
+~ * rsp points to the top of the value stack
+~
+~   Notably, rbp is still uninitialialized after _start.
+~
+~ Stack in:
+~
+~ * argc, argv, envp in the usual Unix way
+~     We ignore them, though.
+~
+~ Stack out:
+~
+~ * The value of "heap", as a pointer
+~     The meaning of this will be explained below.
+~
+~ Registers within:
+~
+~ * rdi points to the base the heap was allocated at, once it exists
+~     This is the same value that "heap" will hold, once we reach a point
+~   where we have variables. Of course, variables are stored on the heap,
+~   hence this temporary measure.
+~
+~   We also take this opportunity to define soeme memory layout parameters
+~ that this routine will be responsible for doing something with:
+~
+~ (output memory start, current output point
+~  -- output memory start, current output point)
+: cold-start
+  current-offset L' cold-start set-label
+  cld                                      ~ clear the DF flag
+
+  ~   Prepare the heap.
+  ~
+  ~   We could ask for a data segment in the program header, but where's the
+  ~ fun in that? Instead, we call mmap().
+  ~
+  ~   If we wanted the kernel to do ASLR for us, passing address zero would
+  ~ cause it to pick somewhere at random, but instead we choose our own
+  ~ location. It's still not guaranteed to be where we ask for, so we still
+  ~ do the work to record where it wound up. We could pass the "fixed" flag
+  ~ and the kernel would trust us, but this gives us more options for
+  ~ interoperating with other runtimes.
+  ~
+  9 :rax mov-reg64-imm64                      ~ mmap()
+  heap-requested-address :rdi mov-reg64-imm64 ~ address (very arbitrary)
+  heap-size :rsi mov-reg64-imm64              ~ size (one meg)
+  0x07 :rdx mov-reg64-imm64                   ~ protection (read+write+exec)
+  0x22 :r10 mov-extrareg64-imm64              ~ flags (private+anonymous)
+  0 :r8 mov-extrareg64-imm64                  ~ file descriptor (ignored)
+  0 :r9 mov-extrareg64-imm64                  ~ offset (ignored)
+  syscall
+
+  ~   The return value of the system call is in rax, we'll use it in a sec.
+  ~ We need to save this somewhere in case we ever want to munmap() it;
+  ~ there's no widely-used name for it so we have to make one up. S0 and r0
+  ~ are widely-used names for the physical tops (logical bottoms) of the
+  ~ value and control stacks, respectively, and we will eventually set those
+  ~ up as well, so we should keep those names in mind. The control stack
+  ~ lives within the heap, while the value stack is its own segment. This
+  ~ value, though, is the physical bottom of the segment, meaning that it
+  ~ stays the same even as we allocate and deallocate things within it. This
+  ~ is unlike the two stack pointers, so we give it a name that doesn't
+  ~ suggest similarity: "heap".
+  ~
+  ~   Once Forth is fully set up, its internal variables will be accessed
+  ~ through variable-words like any other Forth data, including "heap". To
+  ~ get to that point, though, we need to be able to hold onto variable data
+  ~ between now and then. In fact, if we don't have at least one of "heap"
+  ~ and "here" (its counterpart which points to the logical top end), all
+  ~ our efforts to hold onto anything seem a bit doomed.
+  ~
+  ~   So, we temporarily dedicate rdi to "heap" - only within this routine -
+  ~ and store everything else in ways that let us find things by reference
+  ~ to it. We choose rdi because it works with the indexing modes we care
+  ~ about, and its name suggests its function.
+  ~
+  ~   The strategy Jonesforth uses is not applicable to us; Jonesforth
+  ~ takes advantage of the linker to let its code segment refer to specific,
+  ~ pre-allocated objects in the data segment. We are our own linker, and we
+  ~ don't care to have a data segment. Hence, this approach.
+  ~
+  ~   Keying things off "heap" is the fundamental decision, but to make sure
+  ~ our variables are accessible both during early bootstrapping, and later,
+  ~ we also have to be thoughtful about data structures. More on that in a
+  ~ moment.
+  ~
+  :rax :rdi mov-reg64-reg64
+
+  ~   We also initialize rbp. We could hold off and let "quit" do this, but
+  ~ doing it now is the easiest way to initialize the r0 variable, since
+  ~ there's no instruction that moves a 64-bit immediate to memory.
+  ~
+  ~   This is the moment at which we decide where the control stack starts!
+  ~ Fun, right? "Allocation" is just a fancy word for picking where we want
+  ~ something, then being consistent about it - like placing furniture in
+  ~ your home. See below for a little more thought about why here in
+  ~ particular.
+  ~
+  :rdi control-stack-size :rbp lea-reg64-disp32-reg64
+
+  ~   Now we save some stuff onto the heap. These are the locations that
+  ~ will eventually be the backing stores of the Forth variables, but we
+  ~ don't create the word headers yet, since there's no requirement that
+  ~ they be next to the backing stores. We'll do that later, once we have
+  ~ word-writing infrastructure in place. For now, we just use their offsets
+  ~ relative to the physical bottom of the heap, which are fixed.
+  ~
+  ~   These will be the permanent homes of these values, though we have
+  ~ copies of them elsewhere while we're still in this routine.
+  ~
+  :rdi control-stack-size 0x00 + :rdi mov-reg64-disp32-reg64  ~ heap
+  :rsp control-stack-size 0x08 + :rdi mov-reg64-disp32-reg64  ~ s0
+  :rbp control-stack-size 0x10 + :rdi mov-reg64-disp32-reg64  ~ r0
+  L@' final-word-name :rax mov-reg64-imm64
+  :rax control-stack-size 0x18 + :rdi mov-reg64-disp32-reg64  ~ latest
+  :rdi control-stack-size 0x28 + :rax lea-reg64-disp32-reg64
+  :rax control-stack-size 0x20 + :rdi mov-reg64-disp32-reg64  ~ here
+  ~
+  ~ * "heap" is the physical bottom of the heap
+  ~     The heap grows upwards in memory, so this is also the logical
+  ~   bottom. This comes from the address mmap() just returned to us.
+  ~ * "s0" is the logical bottom of the value stack
+  ~     The value stack grows downwards in memory, so this is the physical
+  ~   top of it. This comes from the stack pointer the kernel initialized us
+  ~   with.
+  ~ * "r0" is the logical bottom of the control stack
+  ~     The control stack also grows downwards, so this is its pysical top
+  ~   as well. We allocate this dedicated space within the heap right here,
+  ~   in this routine, through our choice of where to put things.
+  ~ * "here" is the physical start of the unallocated space in the heap
+  ~     We allocate heap space from bottom to top, by incrementing this
+  ~   value. So, it would also be accurate to say that it points immediately
+  ~   after the physical top of the allocated space. At any rate, the
+  ~   address it points to is the first one that hasn't been used yet.
+  ~ * "latest" is the address of the most-recently-defined word's header
+  ~     Defining new words changes this value.
+  ~
+  ~   s0 and r0 are mostly used when we want to initialize or reinitialize
+  ~ their respective stacks - that is, discard all their contents at once.
+  ~
+  ~   The value of r0 is the same address these variables start at, so
+  ~ you'll want to do a close read of the implementation of pushcontrol
+  ~ and convince yourself that it only ever writes things just below the rbp
+  ~ address it receives, never right on top of it.
+  ~
+  ~   As an aside, by the way, please notice that strictly speaking, r0
+  ~ could be a constant... but it isn't known until runtime, so we might as
+  ~ well make it a variable. That will play nicely with any astonishing
+  ~ memory shenanigans someone might wish to do in the future.
+  ~
+  ~   Notice also that "here" points immediately after itself. This is just
+  ~ a convenience, making it the last one like that so that the concern is
+  ~ dealt with in a single place and is easy to keep up-to-date with code
+  ~ changes.
+  ~
+  ~   A little more detail about why we offset everything by
+  ~ control_stack_size: We're carving out some space at the bottom of the
+  ~ heap - which grows low-to-high - to be the control stack - which grows
+  ~ high-to-low. So the control stack is allocated out of the heap as a
+  ~ fixed-size, one-time thing, and then the variables come immediately
+  ~ after that. We do need to use 32-bit displacement indexing to access
+  ~ them this way, but that's no big deal.
+  ~
+  ~   This is perhaps questionable, they should maybe be separate segments
+  ~ created with separate calls to mmap(), but for now we're not worried
+  ~ about overflow so we use the same allocation for both.
+  ~
+  ~   We'll come back to these variables a bit later and generate the word
+  ~ headers that point at them, but now we're almost ready to switch to
+  ~ proper threaded-execution, so we finish that setup first...
+
+  ~   Push the value of "heap" onto the value stack so that it can be the
+  ~ breadcrumb the threaded code needs to find... the backing store of
+  ~ "heap". Yes, self-reference can be weird like that sometimes. There's
+  ~ nothing stopping "quit" from reading rdi, it just violates the
+  ~ abstraction...
+  :rdi push-reg64
+
+  ~   We are about to set up rsi, we did rbp already, and rsp came to us
+  ~ already set up. That's all that "next" needs, so take it away!
+  L@' warm-start :rsi mov-reg64-imm64
+  next ;
+
+~ Routine warm-start
+~ ~~~~~~~~~~~~~~~~~~
+~
+~   This routine runs as Forth code. That is, indirect threaded execution has
+~ been initialized, and the structure of this routine is an array of codeword
+~ pointers. It was directly jumped to from cold-start. There's nowhere to
+~ return to, so it needs to never return.
+: warm-start
+  ~   While it's not actually a requirement that codeword pointers be
+  ~ word-aligned, it's highly likely that it helps performance. (Whether it
+  ~ does is up to Intel's microcode.)
+  8 packalign
+  current-offset L' warm-start set-label
+
+  ~   Before handing off to us, cold-start pushed a single value onto the
+  ~ stack, a pointer to the beginning of the heap. Now, we load our entire
+  ~ Forth implementation onto that heap, beginning with the minimal set of
+  ~ words needed to define more words. We do this because we need variables as
+  ~ infrastructure so we can eventually have dynamic definitions.
+  ~
+  ~   There's something non-obvious here: words implemented statically as
+  ~ part of the executable image can't contain things that vary at runtime.
+  ~ That means that even if these words tried to implement some sort of
+  ~ dynamic lookup, they would have no way to find the root of whatever
+  ~ dynamic data structure they use. Dynamism needs to be bootstrapped.
+  ~
+  ~   In a more traditional C-style program, static code could look up
+  ~ variables based on fixed addresses that are the same on every run.
+  ~ Failing that, we could dedicate a register to it, though that's a
+  ~ considerable expense. We chose not to do either of those things, because
+  ~ we want the versatility that comes with not being picky about our
+  ~ address space: It allows us to contemplate future improvements such as
+  ~ ASLR, or embedding into other processes that impose their own addressing
+  ~ constraints, or even coexisting with multiple versions of ourselves.
+  ~ That choice does mean we have the hard version of this bootstrapping
+  ~ problem, and copying ourselves to the heap is how we solve it.
+  ~
+  ~   We do have the heap address right now, though that won't last. In case
+  ~ it's unclear why not: keeping it on the stack would require all future
+  ~ references to walk the stack, and somehow know when they've reached the
+  ~ bottom. The stack is a good place to keep things with clearly delimited
+  ~ lifetimes and visibility, but when we want something to live for our
+  ~ entire program and be easy to find from any code within it, we need to
+  ~ do something else. Anyway, since we have the address, we can use it for
+  ~ the next little bit of setup.
+  ~
+  ~   The first few words we define are our variables, which hardcode the
+  ~ addresses they will return - but since we're doing this at runtime,
+  ~ "hardcoding" can reflect where our heap is. This is the fundamental
+  ~ trick that makes the heap usable.
+  ~
+  ~   One more thing to notice: We already allocated the backing stores of
+  ~ these variables, and populated their initial values, in _start. The
+  ~ words we're defining return those same addresses for the same backing
+  ~ stores. So, we have continuity: Stuff defined in terms of the
+  ~ variable-words we're defining now will interoperate with the stuff that
+  ~ we define in the "early" way, which includes those very words. Both the
+  ~ early code and the later code are dealing with the same data structures,
+  ~ they're just using a different technique to find them.
+  ~
+  ~   This is the only hardcoding we need to do; by building on top of it,
+  ~ we will soon reach a point where the rest of the system can be defined
+  ~ within itself.
+  ~ TODO These need to, like, exist first. Also they need to be referenced
+  ~ as labels.
+  ~ dq early_heap, litstring, "heap", early_variable
+  ~ dq early_s0, litstring, "s0", early_variable
+  ~ dq early_r0, litstring, "r0", early_variable
+  ~ dq early_latest, litstring, "latest", early_variable
+  ~ dq early_here, litstring, "here", early_variable
+  ;
diff --git a/interpret.e b/interpret.e
index 235c98b..4517b25 100644
--- a/interpret.e
+++ b/interpret.e
@@ -4,8 +4,8 @@
 ~
 ~   The code in this file defines the basic syntax and semantics of Forth as
 ~ a text-based language. It's written in terms of the underlying executor,
-~ which is implemented and explained in evoke.e. The execution model gives us
-~ the concept of "words"; the control and value stacks; and the ability to
+~ which is implemented and explained in execution.e. The execution model gives
+~ us the concept of "words"; the control and value stacks; and the ability to
 ~ call things. It has nothing to say about text, only about the binary form of
 ~ the language.
 ~
@@ -59,8 +59,8 @@
 ~ all set up to start appending contents to the new word by calling ",".
 ~
 ~   There's a handy diagram of the entry header format under "quick
-~ reference", in the description of the exeuction model in evoke.e. Create is
-~ responsible for everything up to the codeword, not including it.
+~ reference", in the description of the exeuction model in exeuction.e. Create
+~ is responsible for everything up to the codeword, not including it.
 ~
 ~   When a word is created in interpret mode using s" to provide a string
 ~ literal, the temporary space that s" uses is in the same place as the
@@ -127,10 +127,34 @@
   ~   Finally, we write our updated value of "here" back into the variable.
   here ! ;
 
-~ ,                                                     0000001000018080
-~ self-codeword                                         00000010000180d0
-~ variable                                              0000001000018128
-~ allocate                                              00000010000181c8
+~ (value to append to current word-in-progress --)
+: , here @ swap pack64 here ! ;
+
+: self-codeword here @ 8 + , ;
+
+~ (address for new variable word to create, name string --)
+: variable
+  create
+  self-codeword
+  here @
+  swap :rax mov-reg64-imm64
+  :rax push-reg64
+  next
+  8 packalign
+  here ! ;
+
+~   Allocates bytes on the heap by incrementing the global "here" pointer. The
+~ "here" pointer is kept aligned to an 8-byte boundary, regardless of the size
+~ requested.
+~
+~   This does not create dictionary entries, it's just a raw memory interface.
+~ It's suitable for allocating data or scratch space.
+: allocate
+  here @ dup
+  ~ (size, here value, here value)
+  3roll + 8 packalign here ! ;
+
+~ TODO
 ~ buffer-physical-start                                 0000001000018240
 ~ buffer-physical-length                                0000001000018270
 ~ buffer-logical-start                                  00000010000182c0
@@ -157,31 +181,174 @@
 ~ accumulate-string                                     0000001000018fc8
 ~ word                                                  00000010000194a0
 ~ find                                                  00000010000195f0
-~ is-alphanumeric                                       0000001000019628
-~ generalized-digit-value                               0000001000019850
-~ decode-generalized-digit                              0000001000019970
-~ read-base-unsigned                                    0000001000019a58
-~ read-integer-unsigned                                 0000001000019cb8
-~ read-integer                                          0000001000019eb0
+
+~ (character -- 1 for true or 0 for false)
+: is-alphanumeric
+  ~ We don't have a character-literal syntax; the hex constants here are
+  ~ ASCII codes.
+  dup 0x30 stackhex > { drop 0 exit } if ~ Less than "0".
+  dup 0x39 >= { drop 1 exit } if ~ Less than or equal to "9".
+  dup 0x41 > { drop 0 exit } if ~ Less than "A".
+  dup 0x5a >= { drop 1 exit } if ~ Less than or equal to "Z".
+  dup 0x61 > { drop 0 exit } if ~ Less than "a".
+  dup 0x7a >= { drop 1 exit } if ~ Less than or equal to "z".
+  drop 0 ; ~ Greater than "z".
+
+
+~ (character -- value)
+: generalized-digit-value
+  ~ We don't have a character-literal syntax; the hex constants here are
+  ~ ASCII codes.
+  dup 0x61 <= { 0x61 - 10 + exit } if ~ lowercase "a"
+  dup 0x41 <= { 0x41 - 10 + exit } if ~ uppercase "a"
+  0x30 - ; ~ digit "0"
+
+
+~ (character, base
+~  -- value (if successful),
+~     error indicator (zero equals success))
+: decode-generalized-digit
+  swap dup is-alphanumeric {
+    ~ It's alphanumeric.
+    ~ (base, character)
+    generalized-digit-value
+    ~ (base, value)
+    dup 3roll
+    ~ (value, value, base)
+    > {
+      ~ It's in range.
+      ~ (value)
+      0 exit } if
+    ~ It's out of range.
+    ~ (value)
+    drop 1 exit } if
+  ~ It's not alphanumeric.
+  drop drop 1 ;
+
+
+~ (string pointer, base
+~  -- result (if successful),
+~     error indicator (zero equals success))
+: read-base-unsigned
+  swap
+
+  ~ If the first byte is null, this is an error
+  unpack8
+  ~ (numeric base, current point in string, character)
+  dup 0 = { drop drop drop 1 exit } if
+
+  ~ Decode the first byte as a generalized digit in the base.
+  ~ (numeric base, current point in string, character)
+  ~ If the first byte is less than "0", this is an error.
+  3roll dup 4 unroll
+  ~ (numeric base, current point in string, character, numeric base)
+  decode-generalized-digit {
+    ~ (numeric base, current point in string)
+    drop drop 1 exit } if
+
+  ~ The first byte is a valid generalized digit in the appropriate base, so
+  ~ let's get started.
+  ~ (numeric base, current point in string, initial value)
+  swap
+
+  {
+    ~ (numeric base, result so far, current point in string)
+    unpack8 dup 0 = {
+      ~ A null after the first character is valid, and indicates we're done.
+      drop drop swap drop 0 exit } if
+
+    ~ Decode the latest byte as a generalized digit in the base.
+    ~ (numeric base, result so far, current point in string, latest byte)
+    4 roll dup 5 unroll
+    ~ (numeric base, result so far, current point in string, character
+    ~  numeric base)
+    decode-generalized-digit {
+      ~ If the latest character is not a valid digit, that's an error.
+      ~ (numeric base, result so far, current point in string)
+      drop drop drop 1 exit } if
+
+    ~ The latest character is valid, so incorporate it and loop.
+    ~ (numeric base, result so far, current point in string, latest value)
+    3roll 4 roll dup 5 unroll * + swap
+  } forever ;
+
 
 ~ (string pointer
 ~  -- result (if successful),
 ~     error indicator (zero equals success))
-: read-decimal
-  dup unpack8 lit 0 != 0branch [ 6 8 * , ] ~ TODO character literal minus
-  ~ This is the case where it's non-negative.
+: read-integer-unsigned
+  ~ We don't have a character-literal syntax; the hex constants here are
+  ~ ASCII codes.
+  dup unpack8 0x30 != { ~ digit "0"
+    ~ This is the case where the leading digit is not a zero.
+    ~ (original string pointer, advanced string pointer)
+    drop 10 read-base-unsigned exit } if
+
+  ~ This is the case where the leading digit is a zero.
   ~ (original string pointer, advanced string pointer)
-  drop 10 read-base-unsigned exit
+  unpack8 dup 0x78 = { ~ lowercase "x"
+    ~ (original string pointer, doubly advanced string pointer, character)
+    drop swap drop 16 read-base-unsigned exit } if
+
+  dup 0x6f = { ~ lowercase "o"
+    ~ (original string pointer, doubly advanced string pointer, character)
+    drop swap drop 8 read-base-unsigned exit } if
+
+  dup 0x62 = { ~ lowercase "b"
+    ~ (original string pointer, doubly advanced string pointer, character)
+    swap drop swap 2 read-base-unsigned exit } if
+
+  ~ This is the case where the second character is something else.
+  ~ (original string pointer, doubly advanced string pointer, character)
+  drop drop 10 read-base-unsigned ;
+
+
+~ (string pointer
+~  -- result (if successful),
+~     error indicator (zero equals success))
+: read-integer
+  ~ We don't have a character-literal syntax; this is ASCII for a hyphen.
+  dup unpack8 0x2d != {
+    ~ This is the case where it's non-negative.
+    ~ (original string pointer, advanced string pointer)
+    drop read-integer-unsigned exit
+  } if
 
   ~ This is the case where it's negative.
   ~ (original string pointer, advanced string pointer)
-  swap drop 10 read-base-unsigned
+  swap drop read-integer-unsigned
   ~ (result maybe, exit code)
-  dup 0branch [ 2 8 * , ]
+  dup {
+    ~ Failure
+    ~ (non-zero exit code)
+    exit
+  } if
+
+  ~ Success
+  ~ (result, zero exit code)
+  swap -1 * swap ;
+
+
+~ (string pointer
+~  -- result (if successful),
+~     error indicator (zero equals success))
+: read-decimal
+  ~ We don't have a character-literal syntax; this is ASCII for a hyphen.
+  dup unpack8 0x2d != {
+    ~ This is the case where it's non-negative.
+    ~ (original string pointer, advanced string pointer)
+    drop 10 read-base-unsigned exit
+  } if
 
-  ~ Failure
-  ~ (non-zero exit code)
-  exit
+  ~ This is the case where it's negative.
+  ~ (original string pointer, advanced string pointer)
+  swap drop 10 read-base-unsigned
+  ~ (result maybe, exit code)
+  dup {
+    ~ Failure
+    ~ (non-zero exit code)
+    exit
+  } if
 
   ~ Success
   ~ (result, zero exit code)
@@ -198,10 +365,10 @@
 ~ we give it a name.
 
 ~ TODO this is the "create" / "here" conflict thing
-~ describe-compilation
-~ ' interpreter-flags-storage describe
-~ ' interpreter-flags describe
-~ newline
+describe-compilation
+' interpreter-flags-storage describe
+' interpreter-flags describe
+newline
 ~ here @ hexdump
 ~ s" interpreter-flags-storage" stackhex create stackhex ~ make-immediate 0 ,
 ~ ~ latest @ dup unhide-entry s" interpreter-flags" variable
@@ -337,12 +504,13 @@
 : quit { interpret } forever ;
 
 ~ quit
-~ 4 5 + . : za 13 12 - . ; za
-~ : ' word value@ find dropstring-with-result
-~   interpreter-flags @ 1 & { literal } if ; make-immediate
-~ ' za . newline
-~ : piz ' za . newline ; piz
-~ ~ ' interpret forget quit 2 3 * .
-~ ' ' describe ' za describe ' piz describe
+-0x10 newline . newline
+4 5 + . : za 13 12 - . ; za
+: ' word value@ find dropstring-with-result
+  interpreter-flags @ 1 & { literal } if ; make-immediate
+' za . newline
+: piz ' za . newline ; piz
+~ ' interpret forget quit 2 3 * .
+' ' describe ' za describe ' piz describe
 bye