~ ~~~~~~~~~~~~~~~~~ ~ ~~ Interpreter ~~ ~ ~~~~~~~~~~~~~~~~~ ~ ~ The code in this file defines the basic syntax and semantics of Forth as ~ a text-based language. It's written in terms of the underlying executor, ~ which is implemented and explained in execution.e. The execution model gives ~ us the concept of "words"; the control and value stacks; and the ability to ~ call things. It has nothing to say about text, only about the binary form of ~ the language. ~ ~ It's traditional in Forth to refer to an act of "compiling" code, which ~ in this context means turning it from text into its binary representation. ~ That binary representation most commonly takes the form of a word entry ~ header followed by an array of codeword pointers. ~ ~ It would be legitimate to critique the terminology by saying that codeword ~ pointers are still, in some sense, interpreted: They are not machine code to ~ be directly executed by the CPU; they rely on "docol" and "next" at runtime. ~ However, in language design circles, the term "compilation" takes on a ~ broader meaning, referring to any process which requires some or all of the ~ types of infrastructure we regard as being compiler internals: A successive ~ translation of code from one form into another, discarding some types of ~ information while computing others, in a careful order that results in ~ logically consistent output which in some sense has the same meaning as the ~ input. Sometimes this output may be machine code, but often it is another ~ language meant for human consumption, or an intermediate layer meant to be ~ fed into another process. ~ ~ Forth compilation is compilation in this sense, so there is no conflict ~ and we run with the established terminology. In addition, it must be noted ~ that Evocation, like many Forths, makes extensive use of words which are ~ implemented directly in machine code; the Forth execution model allows these ~ words to co-exist with words that are interpreted by "docol". ~ ~ At any rate, the code in this file is responsible for that compilation. ~ ~ It is primarily concerned with managing the contents of an area of memory ~ we call the "log"; see log-load.e for more detail on terminology. : hide-entry dup entry-flags@ 0x80 | entry-flags! ; : unhide-entry dup entry-flags@ 0x80 invert & entry-flags! ; ~ TODO ~ unroll-past-string 0000001000018db8 ~ swap-past-string 0000001000018ea0 ~ dropstring 0000001000018ee8 ~ dropstring-with-result 0000001000018f80 ~ accumulate-string 0000001000018fc8 ~ is-space 0000001000018b00 ~ word 00000010000194a0 ~ The word "'", often pronounced "tick", quotes the following word, looking ~ it up and treating it as a constant. In immediate mode, the constant winds ~ up on the stack; in compile mode it gets compiled. ~ ~ There are a few possible implementation strategies here. Running as an ~ immediate word means there's a clear and unambiguous concept of "the ~ following word", so that's what we do; otherwise we'd have to get clever ~ about somehow finding out where we were called from. That means we take on ~ what would otherwise be the interpreter's responsibility, of checking what ~ mode we're in. Happily, that's easy to do. ~ ~ There's a cyclic dependency where "if" relies on "'", and "'" relies on ~ "if". Fortunately both of them are treated as alternates by the log-load ~ transform, so we don't have to worry about it. : ' word value@ find dropstring-with-result interpreter-flags @ 1 & { literal } if ; make-immediate ~ (character -- 1 for true or 0 for false) : is-alphanumeric ~ We don't have a character-literal syntax; the hex constants here are ~ ASCII codes. dup 0x30 > { drop 0 exit } if ~ Less than "0". dup 0x39 >= { drop 1 exit } if ~ Less than or equal to "9". dup 0x41 > { drop 0 exit } if ~ Less than "A". dup 0x5a >= { drop 1 exit } if ~ Less than or equal to "Z". dup 0x61 > { drop 0 exit } if ~ Less than "a". dup 0x7a >= { drop 1 exit } if ~ Less than or equal to "z". drop 0 ; ~ Greater than "z". ~ (character -- value) : generalized-digit-value ~ We don't have a character-literal syntax; the hex constants here are ~ ASCII codes. dup 0x61 <= { 0x61 - 10 + exit } if ~ lowercase "a" dup 0x41 <= { 0x41 - 10 + exit } if ~ uppercase "a" 0x30 - ; ~ digit "0" ~ (character, base ~ -- value (if successful), ~ error indicator (zero equals success)) : decode-generalized-digit swap dup is-alphanumeric { ~ It's alphanumeric. ~ (base, character) generalized-digit-value ~ (base, value) dup 3roll ~ (value, value, base) > { ~ It's in range. ~ (value) 0 exit } if ~ It's out of range. ~ (value) drop 1 exit } if ~ It's not alphanumeric. drop drop 1 ; ~ (string pointer, base ~ -- result (if successful), ~ error indicator (zero equals success)) : read-base-unsigned swap ~ If the first byte is null, this is an error unpack8 ~ (numeric base, current point in string, character) dup 0 = { drop drop drop 1 exit } if ~ Decode the first byte as a generalized digit in the base. ~ (numeric base, current point in string, character) ~ If the first byte is less than "0", this is an error. 3roll dup 4 unroll ~ (numeric base, current point in string, character, numeric base) decode-generalized-digit { ~ (numeric base, current point in string) drop drop 1 exit } if ~ The first byte is a valid generalized digit in the appropriate base, so ~ let's get started. ~ (numeric base, current point in string, initial value) swap { ~ (numeric base, result so far, current point in string) unpack8 dup 0 = { ~ A null after the first character is valid, and indicates we're done. drop drop swap drop 0 exit } if ~ Decode the latest byte as a generalized digit in the base. ~ (numeric base, result so far, current point in string, latest byte) 4 roll dup 5 unroll ~ (numeric base, result so far, current point in string, character ~ numeric base) decode-generalized-digit { ~ If the latest character is not a valid digit, that's an error. ~ (numeric base, result so far, current point in string) drop drop drop 1 exit } if ~ The latest character is valid, so incorporate it and loop. ~ (numeric base, result so far, current point in string, latest value) 3roll 4 roll dup 5 unroll * + swap } forever ; ~ (string pointer ~ -- result (if successful), ~ error indicator (zero equals success)) : read-integer-unsigned ~ We don't have a character-literal syntax; the hex constants here are ~ ASCII codes. dup unpack8 0x30 != { ~ digit "0" ~ This is the case where the leading digit is not a zero. ~ (original string pointer, advanced string pointer) drop 10 read-base-unsigned exit } if ~ This is the case where the leading digit is a zero. ~ (original string pointer, advanced string pointer) unpack8 dup 0x78 = { ~ lowercase "x" ~ (original string pointer, doubly advanced string pointer, character) drop swap drop 16 read-base-unsigned exit } if dup 0x6f = { ~ lowercase "o" ~ (original string pointer, doubly advanced string pointer, character) drop swap drop 8 read-base-unsigned exit } if dup 0x62 = { ~ lowercase "b" ~ (original string pointer, doubly advanced string pointer, character) swap drop swap 2 read-base-unsigned exit } if ~ This is the case where the second character is something else. ~ (original string pointer, doubly advanced string pointer, character) drop drop 10 read-base-unsigned ; ~ (string pointer ~ -- result (if successful), ~ error indicator (zero equals success)) : read-integer ~ We don't have a character-literal syntax; this is ASCII for a hyphen. dup unpack8 0x2d != { ~ This is the case where it's non-negative. ~ (original string pointer, advanced string pointer) drop read-integer-unsigned exit } if ~ This is the case where it's negative. ~ (original string pointer, advanced string pointer) swap drop read-integer-unsigned ~ (result maybe, exit code) dup { ~ Failure ~ (non-zero exit code) exit } if ~ Success ~ (result, zero exit code) swap -1 * swap ; ~ (string pointer ~ -- result (if successful), ~ error indicator (zero equals success)) : read-decimal ~ We don't have a character-literal syntax; this is ASCII for a hyphen. dup unpack8 0x2d != { ~ This is the case where it's non-negative. ~ (original string pointer, advanced string pointer) drop 10 read-base-unsigned exit } if ~ This is the case where it's negative. ~ (original string pointer, advanced string pointer) swap drop 10 read-base-unsigned ~ (result maybe, exit code) dup { ~ Failure ~ (non-zero exit code) exit } if ~ Success ~ (result, zero exit code) swap -1 * swap ; ~ Here, we allocate a single machine word's worth of space to use as the ~ backing store of a mutable variable, initialized to zero. Then we define the ~ variable which points to that address. ~ ~ We don't actually need a word header for interpreter-flags-storage, we ~ could just append a zero and point to it directly, but that would make life ~ harder for words that attempt to work with the contents of other words. So ~ we give it a name. s" interpreter-flags-storage" create make-immediate make-visible here @ 0 , s" interpreter-flags" variable ~ There's an important bootstrapping concern: If you're loading this ~ interpreter into a running Evocation, it's important to not use the wrong ~ interpreter state value. TODO longer explanation ~ TODO the definition of set-word-immediate would come here; is it needed? : [ interpreter-flags @ 0x01 invert & interpreter-flags ! ; make-immediate latest @ dup hide-entry : ] interpreter-flags @ 0x01 | interpreter-flags ! ; latest @ dup hide-entry ~ (pointer to [ entry, pointer to ] entry) ~ It may seem nonsensical to use : to define :, but the bootstrapping stuff ~ overrides what it does, so it works. The same, of course, goes for all these ~ other word-defining words. ~ ~ If the ] at the end feels backwards, imagine to yourself that everything ~ that ISN'T defining a word body is part of an implicit [ ... ] sequence. ~ Doing so doesn't really change anything, but may make you happier. : : word value@ create dropstring docol , latest @ hide-entry ] ; ~ The counterpart of : is ;. : ; ~ See commentary on "literal", below, regarding "lit exit". lit exit , latest @ unhide-entry ~ See above regarding [. Since it's an immediate word, we have to go to ~ extra trouble to compile it as part of ;. Since it's also hidden, we have ~ to go behind the interpreter's back to even get its entry pointer. [ over entry-to-execution-token , ] ; make-immediate latest @ dup hide-entry ~ (pointer to [ entry, pointer to ] entry, pointer to ; entry) : ;asm here @ pack-next 8 packalign here ! latest @ dup unhide-entry entry-to-execution-token dup 8 + swap ! ~ See above. [ 2 pick entry-to-execution-token , ] ; make-immediate latest @ dup hide-entry ~ (pointer to [ entry, pointer to ], pointer to ;, pointer to ;asm) ~ Although we will eventually define the word "'" to give us the symbol of ~ a word, it will rely on being able to compile a literal. Rather than do lots ~ of string processing later, we choose to define this word now to avoid ~ having to look up the word "lit" as part of that. ~ ~ It may be slightly surprising that the construction "lit lit" works as ~ expected, given that ie. "lit 5" will break, as will "lit [", so it's worth ~ explaining why it does. ~ ~ In most respects "lit" is just an ordinary word, which compilation turns ~ into a pointer to its codeword. That's what happens to most words, if ~ they're not a special syntax nor flagged as immediate. It just happens to be ~ a word that it rarely makes sense to use directly, since its purpose is to ~ be generated as part of the output when compiling number literals. The ~ special behavior around number literals is that when "interpret" sees ie. ~ "5", it first compiles "lit", then appends the numeric value 5 as the ~ following item in the compiled word body. ~ ~ The job of "lit" when it's later executed is to push the appropriate value ~ onto the stack and ensure that it doesn't get executed as code. So, whatever ~ you put immediately after it gets treated as a value, even if it's a ~ pointer. ~ ~ The reason that writing "lit 5" in Evocation syntax crashes is that it ~ gets turned into "lit lit 5" when compiled, which treats the second "lit" as ~ a value then tries to use "5" as a codeword pointer. So you can use "lit" ~ to quote whatever you want, it's just if it's already a special syntax you ~ might need to go behind "interpret"'s back to get it into the compiled ~ output. In practice, this is likely the only place that needs to happen, but ~ the mechanism is documented for the sake of whatever comes up in the future. ~ ~ (value -- ) : literal lit lit , , ; ~ Now the single most important word... : interpret word ~ If no word was returned, exit. dup 0 = { drop exit } if ~ The string is on the top of the stack, so to get a pointer to it we get ~ the stack address. ~ (string) value@ find ~ Check whether the word was found in the dictionary. dup 0 != { ~ If the word is in the dictionary, check what mode we're in, then... dropstring-with-result ~ (entry pointer) interpreter-flags @ 0x01 & { ~ ... if we're in compile mode, there's still a chance it's an immediate ~ word, in which case we fall through to interpret mode... dup entry-flags@ 1 & 0 = ~ ... but it's a regular word, so append it to the heap. { entry-to-execution-token , exit } if } if ~ ... if we're in interpret mode, or the word is immediate, run it. entry-to-execution-token execute exit } if ~ If it's not in the dictionary, check whether it's an integer literal. drop ~ As before, we get the stack address and use it as a string pointer. ~ (string) value@ read-integer 0 = { ~ It's a number. interpreter-flags @ 0x01 & { ~ We're in compile mode; append first "lit", then the number, to the ~ heap. The version of "lit" we use is the one that's current when we ~ ourselves are compiled, hardcoded; doing a dynamic lookup would ~ require dealing with what happens if it's not found. dropstring-with-result [ ' lit entry-to-execution-token literal ] , , exit } if ~ We're in interpret mode; push the number to the stack. Or at least, that's ~ what the code we're interpreting will see. Really it's already on the ~ stack, just clean everything else up and leave it there. dropstring-with-result exit } if ~ If it's neither in the dictionary nor a number, just print an error. s" No such word: " emitstring value@ emitstring dropstring ; ~ TODO for ease of debugging, this isn't the full implementation, which lets ~ us exit it to the outer "quit" : quit { interpret } forever ; ~ Now we switch into the new interpreter, enabling the three words we'd been ~ keeping hidden and then calling "quit". unhide-entry unhide-entry unhide-entry quit -0x10 newline . newline 4 5 + . : za 13 12 - . ; za ~ : ' word value@ find dropstring-with-result ~ interpreter-flags @ 1 & { literal } if ; make-immediate ' za . newline : piz ' za . newline ; piz ~ ' interpret forget quit 2 3 * . ' ' describe ' za describe ' piz describe bye