~ ~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~~ Dynamic definitions ~~ ~ ~~~~~~~~~~~~~~~~~~~~~~~~~ ~ ~ This file provides additional facilities which are fundamental parts of ~ Evocation as a language, but which it's not possible to define until global ~ variables are available. Therefore it is incompatible with the label ~ transform, but compatible with the log-load transform and written to obey ~ its constraints; see transform.e for more details on that. ~ ~ The code here relies on the words "log", "s0", "r0", "latest", and "here". ~ These five global variables are the root of all our other data structures. ~ They are defined specially by warm-start in execution.e, since there is no ~ way to create regular definitions for them. Thus, they come to us already ~ set up. ~ ~ It may not be obvious, but when a regular docol-based Evocation word is ~ compiled, it hardcodes pointers to all the words it references, which will ~ be part of it forever after. Thus, these words can reference "here" and so ~ on, and they'll just know where to find it, no runtime mechanism for looking ~ it up is needed. That's important, because there are no good ways to build ~ such a mechanism! It would have to dedicate a register or something of that ~ nature, and registers are far too precious for such a use. ~ ~ In this file, we define a bunch of more sophisticated ways to work with ~ the log, then we use it to define a high-level flow control facility which ~ saves us from having to compute branch offsets by hand. Pleasantly, we get ~ to use this facility before it's actually defined, since the log-load ~ transform also provides it. That being the case, we might as well start with ~ whatever's most urgent - which is, of course, the debugging tools. ~ Debugging tools for real ~ ~~~~~~~~~~~~~~~~~~~~~~~~ : stack s0 @ 8 - { dup value@ 8 + != } { dup s0 @ 8 - != { space } if dup @ . 8 - } while drop newline ; : stackhex s0 @ 8 - { dup value@ 8 + != } { dup s0 @ 8 - != { space } if dup @ .hex64 8 - } while drop newline ; ~ (pointer -- boolean) : is-in-log dup log @ <= swap here @ > && ; ~ (-- entry pointer or 0) : oldest-entry latest @ { dup @ } { @ } while ; ~ (entry pointer -- entry pointer or 0) : next-newer-entry latest @ 2dup = { 2drop 0 exit } if { dup { 2dup @ != } if } { @ } while swap drop ; ~ (entry pointer -- pointer) : guess-entry-end dup entry-flags@ 64 & 64 = { exit } if dup next-newer-entry dup { drop dup is-in-log { drop here @ } { drop } if-else exit } unless swap drop ; ~ (pointer -- entry pointer or 0) : containing-entry dup is-in-log { drop 0 exit } unless latest @ { dup { 2dup > } { 0 } if-else } { @ } while swap drop ; ~ (entry pointer -- boolean) : is-assembly-word entry-to-execution-token dup 8 + swap @ = ; ~ (entry pointer -- boolean) : is-docol-itself entry-to-name s" docol" stringcmp 0 = ; ~ The word named "docol" has the job of returning the value that gets used ~ as the actual codeword. We make the assumption that the codeword will ~ point somewhere near the entry header; we allow for the possibility that ~ it might be before or after. ~ ~ Generally, it's possible for there to be several copies of docol due to ~ alternate logs and things like that, so the goal is to recognize any of ~ them. ~ ~ (entry pointer -- boolean) : is-docol-codeword dup is-in-log { drop 0 exit } unless containing-entry dup { dup is-docol-itself { drop 1 } { next-newer-entry dup { is-docol-itself } if } if-else } if ; ~ TODO this only works on log words ~ ~ (entry pointer -- boolean) : is-docol-interpreted-word dup is-assembly-word { drop 0 exit } if entry-to-execution-token @ is-docol-codeword ; ~ (pointer -- boolean) : is-codeword-pointer dup is-in-log { drop 0 exit } unless dup containing-entry dup { 2drop 0 exit } unless entry-to-execution-token = ; ~ (width --) : indent { dup } { space 1- } while drop ; ~ (entry pointer --) : word-heading dup entry-to-name dup emitstring space stringlen 1+ 54 swap - 0 max indent dup .hex64 dup entry-flags@ dup { space dup 128 & { s" H" emitstring } if dup 64 & { s" M" emitstring } if dup 1 & { s" I" emitstring } if } if drop dup is-assembly-word { s" asm" emitstring } { dup is-docol-interpreted-word { s" raw" emitstring } unless } if-else drop newline ; : list-dictionary oldest-entry { dup } { dup word-heading next-newer-entry } while drop ; ~ (content end, content start, label start --) : hexdump-row 2 indent dup .hex32 dup 4 unroll 0 { dup 16 > } { dup 7 & 0 = { space } if space 2dup + dup 4 pick <= swap 5 pick > && { 2dup + 8@ .hex8 } { space space } if-else 1+ } while newline 5 ndrop ; ~ (end, start --) : hexdump-between dup 16 1- invert & { dup 3 pick > } { 3dup hexdump-row 16 + } while 3 ndrop ; ~ (start, length --) : hexdump-from swap dup 3unroll + swap hexdump-between ; ~ (start --) : hexdump 64 hexdump-from ; ~ (entry pointer --) : describe-hex dup word-heading dup guess-entry-end swap entry-to-execution-token hexdump-between ; ~ (entry pointer --) : describe-docol dup word-heading dup guess-entry-end swap entry-to-execution-token 8 + { 2dup < } { space dup @ dup is-codeword-pointer { execution-token-to-entry entry-to-name emitstring } { . } if-else 8 + } while newline ; ~ (entry pointer --) : describe dup is-docol-interpreted-word { describe-docol } { describe-hex } if-else ; : describe-all oldest-entry { dup } { dup describe next-newer-entry } while drop ; : describe-compilation ~ It's always in progress ;) We just need a header like this so it doesn't ~ get confused with other kinds of debug output. ." compilation in progress" newline latest @ hexdump newline ." here " here @ .hex64 newline ." latest " latest @ .hex64 newline ." name of latest: " latest @ entry-to-name emitstring newline newline ; : bye 0 sys-exit ; ~ Log manipulation ~ ~~~~~~~~~~~~~~~~ ~ In general, we're going to want to be able to go on little excursions ~ where we define utility words that are only useful for one task, then ~ deallocate that stuff after we're done with it. We implement "forget", ~ which removes both dictionary entries and log allocations for the entry ~ pointer it's given and everything that came after. ~ ~ The implementation strategy is the same as Jonesforth's version, but ~ Jonesforth runs in immediate mode and reads a word to operate on, whereas ~ ours takes an entry pointer and runs in either compiled or immediate ~ modes. ~ ~ (entry pointer --) : forget dup @ latest ! here ! ; ~ (value --) : , here @ swap pack64 here ! ; ~ We'll be defining a lot of immediate words, so we should set up a terse ~ way to do that. : make-immediate latest @ dup entry-flags@ 0x01 | entry-flags! ; : make-hidden latest @ dup entry-flags@ 0x80 | entry-flags! ; : make-visible latest @ dup entry-flags@ 0x80 invert & entry-flags! ; ~ Sooner or later we'll want to define recursive words; this one lets us ~ do that. It compiles into a call to the word that's currently being ~ defined (strictly speaking, the one whose definition was most recently ~ begun). : recurse latest @ entry-to-execution-token , ; make-immediate ~ The implementation of find-in is in log-load.e, for now. ~ ~ (string pointer -- entry pointer or 0) : find latest swap find-in ; ~ Allocates bytes on the log by incrementing the global "here" pointer. The ~ "here" pointer is kept aligned to an 8-byte boundary, regardless of the size ~ requested. ~ ~ This does not create dictionary entries, it's just a raw memory interface. ~ It's suitable for allocating data or scratch space. ~ ~ (size -- pointer) : allocate here @ dup ~ (size, here value, here value) 3roll + 8 packalign here ! ; ~ Allocate space by incrementing "here", and output a word entry header in ~ it. Also add it to the "latest" linked list. Use zero as the flag values; ~ accept a string pointer on the stack and use its contents as the name. ~ ~ This is the first step of creating a new word. Its responsibility includes ~ everything up to the codeword, not including the codeword; it leaves things ~ all set up to start appending contents to the new word by calling ",". ~ ~ There's a handy diagram of the entry header format under "quick ~ reference", in the description of the exeuction model in exeuction.e. Create ~ is responsible for everything up to the codeword, not including it. ~ ~ When a word is created in interpret mode using s" to provide a string ~ literal, the temporary space that s" uses is in the same place as the ~ entry header we're going to write out. It really is very useful to have ~ that work. Fortunately, it does! We're able to avoid needing a special case ~ by doing things in a very careful way, as described below. ~ ~ (string pointer --) : create ~ We add one to the string length in order to include the trailing null ~ terminator. This will be the length of our name field; we save an extra ~ copy of it to help with packing later. dup stringlen 1 + dup 3unroll ~ (name field length, string pointer, name field length) ~ We use memmove to put the string in its final position, because it works ~ correctly when the destination overlaps with the source. Notice that we ~ do this before writing anything else in the entry header, to avoid ~ stepping on it. The name string always starts ten bytes into the header, ~ so we can use a fixed offset. here @ 10 + 3unroll memmove ~ (name field length) ~ Now we can get back to the fields that belong at the start of the entry ~ header. We take the value of "here" and keep a working copy of it on the ~ stack, which we'll advance every time we write more bytes. here @ ~ (name field length, updated "here" pointer) ~ Pack the old value of "latest" as the first field of the header, linking ~ from the newly-defined word to the next-newest word. ~ ~ All the entries form a linked list, from newest to oldest. Since the ~ link is the first field in the entry header, you can get from each entry ~ to the one before it just by dereferencing the entry pointer. latest @ pack64 ~ This is the flags byte. It starts at zero; our caller can change it if ~ desired. 0 pack8 ~ This is the "other" null terminator, used when traversing the name ~ string backwards for execution-token-to-entry. Yes, the name is ~ null-terminated at both ends. 0 pack8 + ~ The name field is already populated, so just skip past it. ~ (updated "here" pointer) ~ The codeword is aligned to a machine-word boundary, and the padding for ~ it is create's responsibility. ~ ~ By adding the null terminator before adding alignment padding, we've ~ made sure there's always at least one null byte. Otherwise we'd be missing ~ the terminator if by chance the name were exactly the wrong length. 8 packalign ~ (updated "here" pointer) ~ Retrieve the value of "here", which still doesn't reflect our additions, ~ and store it at the adddress of "latest". It's the start of our ~ newly-defined word, which makes it the latest word. here @ latest ! ~ Finally, we write our updated value of "here" back into the variable. here ! ; ~ Notionally, it might make sense to define "create" in terms of ~ "create-in". Any change like that is being postponed to after the removal ~ of flatassembler, when refactorings will be easier. ~ ~ The dictionary handle points to a pointer to the first item. ~ ~ (string pointer, dictionary handle --) : create-in dup @ here @ swap pack64 ~ (name string pointer, dictionary handle, output point) 0 pack8 0 pack8 3roll packstring 8 packalign ~ (dictionary handle, output point) swap here @ swap ! ~ (output point) here ! ; : self-codeword here @ 8 + , ; ~ A variable is simply a word that returns a specific address, always the ~ same one, at which a value can be stored. This word "variable" takes and ~ address and a word name, and defines the word. Allocating space is its ~ caller's responsibility. ~ ~ TODO the address is constant but the contents vary, confusing, write it up ~ ~ (address for new variable word to point to, string pointer --) : variable create self-codeword here @ swap :rax mov-reg64-imm64 :rax push-reg64 pack-next 8 packalign here ! ; ~ A keyword is a word that evaluates to its own address, which makes it ~ suitable for use as a constant. By convention, all our keywords have names ~ starting with a colon, which imitates the way they work in Common Lisp. ~ ~ Specifically, it returns its own execution token. Thus, executing its ~ result repeatedly will keep giving the same value. We aren't in the habit of ~ doing quote-exec kinds of things in Evocation, but it seems as good as any ~ other unique value, so we might as well. ~ ~ Unlike CL, we don't currently have the lexer automatically create keywords ~ for us; we create them explicitly. That's likely to be added at some point, ~ but at the moment the feature is lying fallow to see whether it winds up ~ seeing a lot of use. ~ ~ (string pointer --) : keyword create ~ Before outputting our codeword, save a copy of the address where it's ~ going to be. That will be the execution token we return. here @ dup ~ (self execution token, output point) ~ Now add a codeword. This is an assembly word, so it's a self-codeword, ~ meaning it points to the word right after itself. dup 8 + pack64 ~ (self execution token, output point) ~ Now we consume the execution token, using it as part of this instruction. :rax mov-reg64-imm64 ~ (output point) ~ To return it, we push it to the stack. :rax push-reg64 ~ Now just the normal stuff every assembly word ends with. pack-next 8 packalign here ! ; ~ Although we will eventually define the word "'" to give us the symbol of ~ a word, it will rely on being able to compile a literal. Rather than do lots ~ of string processing later, we choose to define this word now to avoid ~ having to look up the word "lit" as part of that. ~ ~ It may be slightly surprising that the construction "lit lit" works as ~ expected, given that ie. "lit 5" will break, as will "lit [", so it's worth ~ explaining why it does. ~ ~ In most respects "lit" is just an ordinary word, which compilation turns ~ into a pointer to its codeword. That's what happens to most words, if ~ they're not a special syntax nor flagged as immediate. It just happens to be ~ a word that it rarely makes sense to use directly, since its purpose is to ~ be generated as part of the output when compiling number literals. The ~ special behavior around number literals is that when "interpret" sees ie. ~ "5", it first compiles "lit", then appends the numeric value 5 as the ~ following item in the compiled word body. ~ ~ The job of "lit" when it's later executed is to push the appropriate value ~ onto the stack and ensure that it doesn't get executed as code. So, whatever ~ you put immediately after it gets treated as a value, even if it's a ~ pointer. ~ ~ The reason that writing "lit 5" in Evocation syntax crashes is that it ~ gets turned into "lit lit 5" when compiled, which treats the second "lit" as ~ a value then tries to use "5" as a codeword pointer. So you can use "lit" ~ to quote whatever you want, it's just if it's already a special syntax you ~ might need to go behind "interpret"'s back to get it into the compiled ~ output. In practice, this is likely the only place that needs to happen, but ~ the mechanism is documented for the sake of whatever comes up in the future. ~ ~ (value -- ) : literal lit lit , , ;