~ ~~~~~~~~~~~~~~~~~ ~ ~~ Interpreter ~~ ~ ~~~~~~~~~~~~~~~~~ ~ ~ The code in this file defines the basic syntax and semantics of Forth as ~ a text-based language. It's written in terms of the underlying executor, ~ which is implemented and explained in evoke.e. The execution model gives us ~ the concept of "words"; the control and value stacks; and the ability to ~ call things. It has nothing to say about text, only about the binary form of ~ the language. ~ ~ It's traditional in Forth to refer to an act of "compiling" code, which ~ in this context means turning it from text into its binary representation. ~ That binary representation most commonly takes the form of a word entry ~ header followed by an array of codeword pointers. ~ ~ It would be legitimate to critique the terminology by saying that codeword ~ pointers are still, in some sense, interpreted: They are not machine code to ~ be directly executed by the CPU; they rely on "docol" and "next" at runtime. ~ However, in language design circles, the term "compilation" takes on a ~ broader meaning, referring to any process which requires some or all of the ~ types of infrastructure we regard as being compiler internals: A successive ~ translation of code from one form into another, discarding some types of ~ information while computing others, in a careful order that results in ~ logically consistent output which in some sense has the same meaning as the ~ input. Sometimes this output may be machine code, but often it is another ~ language meant for human consumption, or an intermediate layer meant to be ~ fed into another process. ~ ~ Forth compilation is compilation in this sense, so there is no conflict ~ and we run with the established terminology. In addition, it must be noted ~ that Evocation, like many Forths, makes extensive use of words which are ~ implemented directly in machine code; the Forth execution model allows these ~ words to co-exist with words that are interpreted by "docol". ~ ~ At any rate, the code in this file is responsible for that compilation. ~ ~ It is primarily concerned with managing the contents of an area of memory ~ we call the "log". Traditional Forth TODO ~ TODO find a better place for this : describe-compilation ~ It's always in progress ;) We just need a header like this so it doesn't ~ get confused with other kinds of debug output. ." compilation in progress" newline latest @ hexdump newline ." here " here @ .hex64 newline ." latest " latest @ .hex64 newline ." name of latest: " latest @ entry-to-name emitstring newline newline ; ~ TODO this is identical to the flatassembler version, but it needs to fix the ~ conflict with s" : create here @ latest @ pack64 0 pack8 0 pack8 swap packstring 8 packalign here @ latest ! here ! ; latest @ describe s" foo" create latest @ describe ~ create 0000001000017fa8 ~ , 0000001000018080 ~ self-codeword 00000010000180d0 ~ variable 0000001000018128 ~ allocate 00000010000181c8 ~ buffer-physical-start 0000001000018240 ~ buffer-physical-length 0000001000018270 ~ buffer-logical-start 00000010000182c0 ~ buffer-logical-length 0000001000018308 ~ input-buffer-refill 0000001000018350 ~ clear-buffer 0000001000018398 ~ zero-input-buffer-metadata 0000001000018428 ~ allocate-input-buffer-metadata 0000001000018548 ~ allocate-input-buffer 00000010000185b0 ~ attach-string-to-input-buffer 0000001000018688 ~ main-input-buffer-metadata 0000001000018738 I raw ~ main-input-buffer 0000001000018788 asm ~ consume-from 00000010000187c0 ~ peek-from 0000001000018960 ~ key-from 0000001000018ab8 ~ is-space 0000001000018b00 ~ peek 0000001000018d20 ~ consume 0000001000018d50 ~ key 0000001000018d88 ~ unroll-past-string 0000001000018db8 ~ swap-past-string 0000001000018ea0 ~ dropstring 0000001000018ee8 ~ dropstring-with-result 0000001000018f80 ~ accumulate-string 0000001000018fc8 ~ word 00000010000194a0 ~ find 00000010000195f0 ~ is-alphanumeric 0000001000019628 ~ generalized-digit-value 0000001000019850 ~ decode-generalized-digit 0000001000019970 ~ read-base-unsigned 0000001000019a58 ~ read-integer-unsigned 0000001000019cb8 ~ read-integer 0000001000019eb0 ~ (string pointer ~ -- result (if successful), ~ error indicator (zero equals success)) : read-decimal dup unpack8 lit 0 != 0branch [ 6 8 * , ] ~ TODO character literal minus ~ This is the case where it's non-negative. ~ (original string pointer, advanced string pointer) drop 10 read-base-unsigned exit ~ This is the case where it's negative. ~ (original string pointer, advanced string pointer) swap drop 10 read-base-unsigned ~ (result maybe, exit code) dup 0branch [ 2 8 * , ] ~ Failure ~ (non-zero exit code) exit ~ Success ~ (result, zero exit code) swap -1 * swap ; ~ Here, we allocate a single machine word's worth of space to use as the ~ backing store of a mutable variable, initialized to zero. Then we define the ~ variable which points to that address. ~ ~ We don't actually need a word header for interpreter-flags-storage, we ~ could just append a zero and point to it directly, but that would make life ~ harder for words that attempt to work with the contents of other words. So ~ we give it a name. ~ TODO this is the "create" / "here" conflict thing ~ describe-compilation ~ ' interpreter-flags-storage describe ~ ' interpreter-flags describe ~ newline ~ here @ hexdump ~ s" interpreter-flags-storage" stackhex create stackhex ~ make-immediate 0 , ~ ~ latest @ dup unhide-entry s" interpreter-flags" variable ~ describe-compilation ~ ~ here @ hexdump : hide-entry dup entry-flags@ 0x80 | entry-flags! ; : unhide-entry dup entry-flags@ 0x80 invert & entry-flags! ; ~ TODO the definition of set-word-immediate would come here; is it needed? : [ interpreter-flags @ 0x01 invert & interpreter-flags ! ; make-immediate : ] interpreter-flags @ 0x01 | interpreter-flags ! ; ~ It may seem nonsensical to use : to define :, but the bootstrapping stuff ~ overrides what it does, so it works. The same, of course, goes for all these ~ other word-defining words. ~ ~ If the ] at the end feels backwards, imagine to yourself that everything ~ that ISN'T defining a word body is part of an implicit [ ... ] sequence. ~ Doing so doesn't really change anything, but may make you happier. : : word value@ create dropstring docol , latest @ hide-entry ] ; ~ The counterpart of : is ;. : ; ~ See commentary on "literal", below, regarding "lit exit". lit exit , latest @ unhide-entry ~ See above regarding [. Since it's an immediate word, we have to go to ~ extra trouble to compile it as part of ;. [ ' [ entry-to-execution-token , ] ; make-immediate ~ Although we will eventually define the word "'" to give us the symbol of ~ a word, it will rely on being able to compile a literal. Rather than do lots ~ of string processing later, we choose to define this word now to avoid ~ having to look up the word "lit" as part of that. ~ ~ It may be slightly surprising that the construction "lit lit" works as ~ expected, given that ie. "lit 5" will break, as will "lit [", so it's worth ~ explaining why it does. ~ ~ In most respects "lit" is just an ordinary word, which compilation turns ~ into a pointer to its codeword. That's what happens to most words, if ~ they're not a special syntax nor flagged as immediate. It just happens to be ~ a word that it rarely makes sense to use directly, since its purpose is to ~ be generated as part of the output when compiling number literals. The ~ special behavior around number literals is that when "interpret" sees ie. ~ "5", it first compiles "lit", then appends the numeric value 5 as the ~ following item in the compiled word body. ~ ~ The job of "lit" when it's later executed is to push the appropriate value ~ onto the stack and ensure that it doesn't get executed as code. So, whatever ~ you put immediately after it gets treated as a value, even if it's a ~ pointer. ~ ~ The reason that writing "lit 5" in Evocation syntax crashes is that it ~ gets turned into "lit lit 5" when compiled, which treats the second "lit" as ~ a value then tries to use "5" as a codeword pointer. So you can use "lit" ~ to quote whatever you want, it's just if it's already a special syntax you ~ might need to go behind "interpret"'s back to get it into the compiled ~ output. In practice, this is likely the only place that needs to happen, but ~ the mechanism is documented for the sake of whatever comes up in the future. ~ ~ (value -- ) : literal lit lit , , ; ~ Now the single most important word... : interpret word ~ If no word was returned, exit. dup 0 = { drop exit } if ~ The string is on the top of the stack, so to get a pointer to it we get ~ the stack address. ~ (string) value@ find ~ Check whether the word was found in the dictionary. dup 0 != { ~ If the word is in the dictionary, check what mode we're in, then... dropstring-with-result ~ (entry pointer) interpreter-flags @ 0x01 & { ~ ... if we're in compile mode, there's still a chance it's an immediate ~ word, in which case we fall through to interpret mode... dup entry-flags@ 1 & 0 = ~ ... but it's a regular word, so append it to the heap. { entry-to-execution-token , exit } if } if ~ ... if we're in interpret mode, or the word is immediate, run it. entry-to-execution-token execute exit } if ~ If it's not in the dictionary, check whether it's a decimal number. drop ~ As before, we get the stack address and use it as a string pointer. ~ (string) value@ read-integer 0 = { ~ It's a number. interpreter-flags @ 0x01 & { ~ We're in compile mode; append first "lit", then the number, to the ~ heap. The version of "lit" we use is the one that's current when we ~ ourselves are compiled, hardcoded; doing a dynamic lookup would ~ require dealing with what happens if it's not found. dropstring-with-result [ ' lit entry-to-execution-token literal ] , , exit } if ~ We're in interpret mode; push the number to the stack. Or at least, that's ~ what the code we're interpreting will see. Really it's already on the ~ stack, just clean everything else up and leave it there. dropstring-with-result exit } if ~ If it's neither in the dictionary nor a number, just print an error. s" No such word: " emitstring value@ emitstring dropstring ; ~ TODO for ease of debugging, this isn't the full implementation, which lets ~ us exit it to the outer "quit" : quit { interpret } forever ; ~ quit ~ 4 5 + . : za 13 12 - . ; za ~ : ' word value@ find dropstring-with-result ~ interpreter-flags @ 1 & { literal } if ; make-immediate ~ ' za . newline ~ : piz ' za . newline ; piz ~ ~ ' interpret forget quit 2 3 * . ~ ' ' describe ' za describe ' piz describe bye