~ ~~~~~~~~~~~~~~~~~ ~ ~~ Interpreter ~~ ~ ~~~~~~~~~~~~~~~~~ ~ ~ The code in this file defines the basic syntax and semantics of Forth as ~ a text-based language. It's written in terms of the underlying executor, ~ which is implemented and explained in evoke.e. The execution model gives us ~ the concept of "words"; the control and value stacks; and the ability to ~ call things. It has nothing to say about text, only about the binary form of ~ the language. ~ ~ It's traditional in Forth to refer to an act of "compiling" code, which ~ in this context means turning it from text into its binary representation. ~ That binary representation most commonly takes the form of a word entry ~ header followed by an array of codeword pointers. ~ ~ It would be legitimate to critique the terminology by saying that codeword ~ pointers are still, in some sense, interpreted: They are not machine code to ~ be directly executed by the CPU; they rely on "docol" and "next" at runtime. ~ However, in language design circles, the term "compilation" takes on a ~ broader meaning, referring to any process which requires some or all of the ~ types of infrastructure we regard as being compiler internals: A successive ~ translation of code from one form into another, discarding some types of ~ information while computing others, in a careful order that results in ~ logically consistent output which in some sense has the same meaning as the ~ input. Sometimes this output may be machine code, but often it is another ~ language meant for human consumption, or an intermediate layer meant to be ~ fed into another process. ~ ~ Forth compilation is compilation in this sense, so there is no conflict ~ and we run with the established terminology. In addition, it must be noted ~ that Evocation, like many Forths, makes extensive use of words which are ~ implemented directly in machine code; the Forth execution model allows these ~ words to co-exist with words that are interpreted by "docol". ~ ~ At any rate, the code in this file is responsible for that compilation. ~ ~ It is primarily concerned with managing the contents of an area of memory ~ we call the "log". Traditional Forth TODO ~ TODO find a better place for this : describe-compilation ~ It's always in progress ;) We just need a header like this so it doesn't ~ get confused with other kinds of debug output. ." compilation in progress" newline latest @ hexdump newline ." here " here @ .hex64 newline ." latest " latest @ .hex64 newline ." name of latest: " latest @ entry-to-name emitstring newline newline ; ~ Allocate space by incrementing "here", and output a word entry header in ~ it. Also add it to the "latest" linked list. Use zero as the flag values; ~ accept a string pointer on the stack and use its contents as the name. ~ ~ This is the first step of creating a new word. Its responsibility includes ~ everything up to the codeword, not including the codeword; it leaves things ~ all set up to start appending contents to the new word by calling ",". ~ ~ There's a handy diagram of the entry header format under "quick ~ reference", in the description of the exeuction model in evoke.e. Create is ~ responsible for everything up to the codeword, not including it. ~ ~ When a word is created in interpret mode using s" to provide a string ~ literal, the temporary space that s" uses is in the same place as the ~ entry header we're going to write out. It really is very useful to have ~ that work. Fortunately, it does! We're able to avoid needing a special case ~ by doing things in a very careful way, as described below. ~ ~ (string pointer --) : create ~ We add one to the string length in order to include the trailing null ~ terminator. This will be the length of our name field; we save an extra ~ copy of it to help with packing later. dup stringlen 1 + dup 3unroll ~ (name field length, string pointer, name field length) ~ We use memmove to put the string in its final position, because it works ~ correctly when the destination overlaps with the source. Notice that we ~ do this before writing anything else in the entry header, to avoid ~ stepping on it. The name string always starts ten bytes into the header, ~ so we can use a fixed offset. here @ 10 + 3unroll memmove ~ (name field length) ~ Now we can get back to the fields that belong at the start of the entry ~ header. We take the value of "here" and keep a working copy of it on the ~ stack, which we'll advance every time we write more bytes. here @ ~ (name field length, updated "here" pointer) ~ Pack the old value of "latest" as the first field of the header, linking ~ from the newly-defined word to the next-newest word. ~ ~ All the entries form a linked list, from newest to oldest. Since the ~ link is the first field in the entry header, you can get from each entry ~ to the one before it just by dereferencing the entry pointer. latest @ pack64 ~ This is the flags byte. It starts at zero; our caller can change it if ~ desired. 0 pack8 ~ This is the "other" null terminator, used when traversing the name ~ string backwards for execution-token-to-entry. Yes, the name is ~ null-terminated at both ends. 0 pack8 + ~ The name field is already populated, so just skip past it. ~ (updated "here" pointer) ~ The codeword is aligned to a machine-word boundary, and the padding for ~ it is create's responsibility. ~ ~ By adding the null terminator before adding alignment padding, we've ~ made sure there's always at least one null byte. Otherwise we'd be missing ~ the terminator if by chance the name were exactly the wrong length. 8 packalign ~ (updated "here" pointer) ~ Retrieve the value of "here", which still doesn't reflect our additions, ~ and store it at the adddress of "latest". It's the start of our ~ newly-defined word, which makes it the latest word. here @ latest ! ~ Finally, we write our updated value of "here" back into the variable. here ! ; ~ , 0000001000018080 ~ self-codeword 00000010000180d0 ~ variable 0000001000018128 ~ allocate 00000010000181c8 ~ buffer-physical-start 0000001000018240 ~ buffer-physical-length 0000001000018270 ~ buffer-logical-start 00000010000182c0 ~ buffer-logical-length 0000001000018308 ~ input-buffer-refill 0000001000018350 ~ clear-buffer 0000001000018398 ~ zero-input-buffer-metadata 0000001000018428 ~ allocate-input-buffer-metadata 0000001000018548 ~ allocate-input-buffer 00000010000185b0 ~ attach-string-to-input-buffer 0000001000018688 ~ main-input-buffer-metadata 0000001000018738 I raw ~ main-input-buffer 0000001000018788 asm ~ consume-from 00000010000187c0 ~ peek-from 0000001000018960 ~ key-from 0000001000018ab8 ~ is-space 0000001000018b00 ~ peek 0000001000018d20 ~ consume 0000001000018d50 ~ key 0000001000018d88 ~ unroll-past-string 0000001000018db8 ~ swap-past-string 0000001000018ea0 ~ dropstring 0000001000018ee8 ~ dropstring-with-result 0000001000018f80 ~ accumulate-string 0000001000018fc8 ~ word 00000010000194a0 ~ find 00000010000195f0 ~ is-alphanumeric 0000001000019628 ~ generalized-digit-value 0000001000019850 ~ decode-generalized-digit 0000001000019970 ~ read-base-unsigned 0000001000019a58 ~ read-integer-unsigned 0000001000019cb8 ~ read-integer 0000001000019eb0 ~ (string pointer ~ -- result (if successful), ~ error indicator (zero equals success)) : read-decimal dup unpack8 lit 0 != 0branch [ 6 8 * , ] ~ TODO character literal minus ~ This is the case where it's non-negative. ~ (original string pointer, advanced string pointer) drop 10 read-base-unsigned exit ~ This is the case where it's negative. ~ (original string pointer, advanced string pointer) swap drop 10 read-base-unsigned ~ (result maybe, exit code) dup 0branch [ 2 8 * , ] ~ Failure ~ (non-zero exit code) exit ~ Success ~ (result, zero exit code) swap -1 * swap ; ~ Here, we allocate a single machine word's worth of space to use as the ~ backing store of a mutable variable, initialized to zero. Then we define the ~ variable which points to that address. ~ ~ We don't actually need a word header for interpreter-flags-storage, we ~ could just append a zero and point to it directly, but that would make life ~ harder for words that attempt to work with the contents of other words. So ~ we give it a name. ~ TODO this is the "create" / "here" conflict thing ~ describe-compilation ~ ' interpreter-flags-storage describe ~ ' interpreter-flags describe ~ newline ~ here @ hexdump ~ s" interpreter-flags-storage" stackhex create stackhex ~ make-immediate 0 , ~ ~ latest @ dup unhide-entry s" interpreter-flags" variable ~ describe-compilation ~ ~ here @ hexdump : hide-entry dup entry-flags@ 0x80 | entry-flags! ; : unhide-entry dup entry-flags@ 0x80 invert & entry-flags! ; ~ TODO the definition of set-word-immediate would come here; is it needed? : [ interpreter-flags @ 0x01 invert & interpreter-flags ! ; make-immediate : ] interpreter-flags @ 0x01 | interpreter-flags ! ; ~ It may seem nonsensical to use : to define :, but the bootstrapping stuff ~ overrides what it does, so it works. The same, of course, goes for all these ~ other word-defining words. ~ ~ If the ] at the end feels backwards, imagine to yourself that everything ~ that ISN'T defining a word body is part of an implicit [ ... ] sequence. ~ Doing so doesn't really change anything, but may make you happier. : : word value@ create dropstring docol , latest @ hide-entry ] ; ~ The counterpart of : is ;. : ; ~ See commentary on "literal", below, regarding "lit exit". lit exit , latest @ unhide-entry ~ See above regarding [. Since it's an immediate word, we have to go to ~ extra trouble to compile it as part of ;. [ ' [ entry-to-execution-token , ] ; make-immediate ~ Although we will eventually define the word "'" to give us the symbol of ~ a word, it will rely on being able to compile a literal. Rather than do lots ~ of string processing later, we choose to define this word now to avoid ~ having to look up the word "lit" as part of that. ~ ~ It may be slightly surprising that the construction "lit lit" works as ~ expected, given that ie. "lit 5" will break, as will "lit [", so it's worth ~ explaining why it does. ~ ~ In most respects "lit" is just an ordinary word, which compilation turns ~ into a pointer to its codeword. That's what happens to most words, if ~ they're not a special syntax nor flagged as immediate. It just happens to be ~ a word that it rarely makes sense to use directly, since its purpose is to ~ be generated as part of the output when compiling number literals. The ~ special behavior around number literals is that when "interpret" sees ie. ~ "5", it first compiles "lit", then appends the numeric value 5 as the ~ following item in the compiled word body. ~ ~ The job of "lit" when it's later executed is to push the appropriate value ~ onto the stack and ensure that it doesn't get executed as code. So, whatever ~ you put immediately after it gets treated as a value, even if it's a ~ pointer. ~ ~ The reason that writing "lit 5" in Evocation syntax crashes is that it ~ gets turned into "lit lit 5" when compiled, which treats the second "lit" as ~ a value then tries to use "5" as a codeword pointer. So you can use "lit" ~ to quote whatever you want, it's just if it's already a special syntax you ~ might need to go behind "interpret"'s back to get it into the compiled ~ output. In practice, this is likely the only place that needs to happen, but ~ the mechanism is documented for the sake of whatever comes up in the future. ~ ~ (value -- ) : literal lit lit , , ; ~ Now the single most important word... : interpret word ~ If no word was returned, exit. dup 0 = { drop exit } if ~ The string is on the top of the stack, so to get a pointer to it we get ~ the stack address. ~ (string) value@ find ~ Check whether the word was found in the dictionary. dup 0 != { ~ If the word is in the dictionary, check what mode we're in, then... dropstring-with-result ~ (entry pointer) interpreter-flags @ 0x01 & { ~ ... if we're in compile mode, there's still a chance it's an immediate ~ word, in which case we fall through to interpret mode... dup entry-flags@ 1 & 0 = ~ ... but it's a regular word, so append it to the heap. { entry-to-execution-token , exit } if } if ~ ... if we're in interpret mode, or the word is immediate, run it. entry-to-execution-token execute exit } if ~ If it's not in the dictionary, check whether it's a decimal number. drop ~ As before, we get the stack address and use it as a string pointer. ~ (string) value@ read-integer 0 = { ~ It's a number. interpreter-flags @ 0x01 & { ~ We're in compile mode; append first "lit", then the number, to the ~ heap. The version of "lit" we use is the one that's current when we ~ ourselves are compiled, hardcoded; doing a dynamic lookup would ~ require dealing with what happens if it's not found. dropstring-with-result [ ' lit entry-to-execution-token literal ] , , exit } if ~ We're in interpret mode; push the number to the stack. Or at least, that's ~ what the code we're interpreting will see. Really it's already on the ~ stack, just clean everything else up and leave it there. dropstring-with-result exit } if ~ If it's neither in the dictionary nor a number, just print an error. s" No such word: " emitstring value@ emitstring dropstring ; ~ TODO for ease of debugging, this isn't the full implementation, which lets ~ us exit it to the outer "quit" : quit { interpret } forever ; ~ quit ~ 4 5 + . : za 13 12 - . ; za ~ : ' word value@ find dropstring-with-result ~ interpreter-flags @ 1 & { literal } if ; make-immediate ~ ' za . newline ~ : piz ' za . newline ; piz ~ ~ ' interpret forget quit 2 3 * . ~ ' ' describe ' za describe ' piz describe bye