diff options
| author | Irene Knapp <ireneista@irenes.space> | 2026-05-09 18:32:58 -0700 |
|---|---|---|
| committer | Irene Knapp <ireneista@irenes.space> | 2026-05-09 18:32:58 -0700 |
| commit | 6aded817e2ed13143db15040d88e86d0649f4e85 (patch) | |
| tree | 54b92736c048f30a61f731b5e93d4869dcfc16db | |
| parent | 5481ec020eabce663b5e7423c5e217005df6ad49 (diff) | |
implement more of the interpeter, and add some documentation
Force-Push: yes Change-Id: I33ad8783283643ca4977ab19c378156436707687
| -rw-r--r-- | interpret.e | 239 |
1 files changed, 234 insertions, 5 deletions
diff --git a/interpret.e b/interpret.e index 1270429..2bb9e43 100644 --- a/interpret.e +++ b/interpret.e @@ -1,5 +1,223 @@ +~ ~~~~~~~~~~~~~~~~~ +~ ~~ Interpreter ~~ +~ ~~~~~~~~~~~~~~~~~ +~ +~ The code in this file defines the basic syntax and semantics of Forth as +~ a text-based language. It's written in terms of the underlying executor, +~ which is implemented and explained in evoke.e. The execution model gives us +~ the concept of "words"; the control and value stacks; and the ability to +~ call things. It has nothing to say about text, only about the binary form of +~ the language. +~ +~ It's traditional in Forth to refer to an act of "compiling" code, which +~ in this context means turning it from text into its binary representation. +~ That binary representation most commonly takes the form of a word entry +~ header followed by an array of codeword pointers. +~ +~ It would be legitimate to critique the terminology by saying that codeword +~ pointers are still, in some sense, interpreted: They are not machine code to +~ be directly executed by the CPU; they rely on "docol" and "next" at runtime. +~ However, in language design circles, the term "compilation" takes on a +~ broader meaning, referring to any process which requires some or all of the +~ types of infrastructure we regard as being compiler internals: A successive +~ translation of code from one form into another, discarding some types of +~ information while computing others, in a careful order that results in +~ logically consistent output which in some sense has the same meaning as the +~ input. Sometimes this output may be machine code, but often it is another +~ language meant for human consumption, or an intermediate layer meant to be +~ fed into another process. +~ +~ Forth compilation is compilation in this sense, so there is no conflict +~ and we run with the established terminology. In addition, it must be noted +~ that Evocation, like many Forths, makes extensive use of words which are +~ implemented directly in machine code; the Forth execution model allows these +~ words to co-exist with words that are interpreted by "docol". +~ +~ At any rate, the code in this file is responsible for that compilation. +~ +~ It is primarily concerned with managing the contents of an area of memory +~ we call the "log". Traditional Forth TODO + +~ TODO find a better place for this +: describe-compilation + ~ It's always in progress ;) We just need a header like this so it doesn't + ~ get confused with other kinds of debug output. + ." compilation in progress" newline + latest @ hexdump + newline + ." here " here @ .hex64 newline + ." latest " latest @ .hex64 newline + ." name of latest: " latest @ entry-to-name emitstring newline + newline ; + +~ TODO this is identical to the flatassembler version, but it needs to fix the +~ conflict with s" +: create + here @ + latest @ pack64 + 0 pack8 + 0 pack8 + swap packstring + 8 packalign + here @ latest ! + here ! ; + +latest @ describe +s" foo" create +latest @ describe + +~ create 0000001000017fa8 +~ , 0000001000018080 +~ self-codeword 00000010000180d0 +~ variable 0000001000018128 +~ allocate 00000010000181c8 +~ buffer-physical-start 0000001000018240 +~ buffer-physical-length 0000001000018270 +~ buffer-logical-start 00000010000182c0 +~ buffer-logical-length 0000001000018308 +~ input-buffer-refill 0000001000018350 +~ clear-buffer 0000001000018398 +~ zero-input-buffer-metadata 0000001000018428 +~ allocate-input-buffer-metadata 0000001000018548 +~ allocate-input-buffer 00000010000185b0 +~ attach-string-to-input-buffer 0000001000018688 +~ main-input-buffer-metadata 0000001000018738 I raw +~ main-input-buffer 0000001000018788 asm +~ consume-from 00000010000187c0 +~ peek-from 0000001000018960 +~ key-from 0000001000018ab8 +~ is-space 0000001000018b00 +~ peek 0000001000018d20 +~ consume 0000001000018d50 +~ key 0000001000018d88 +~ unroll-past-string 0000001000018db8 +~ swap-past-string 0000001000018ea0 +~ dropstring 0000001000018ee8 +~ dropstring-with-result 0000001000018f80 +~ accumulate-string 0000001000018fc8 +~ word 00000010000194a0 +~ find 00000010000195f0 +~ is-alphanumeric 0000001000019628 +~ generalized-digit-value 0000001000019850 +~ decode-generalized-digit 0000001000019970 +~ read-base-unsigned 0000001000019a58 +~ read-integer-unsigned 0000001000019cb8 +~ read-integer 0000001000019eb0 + +~ (string pointer +~ -- result (if successful), +~ error indicator (zero equals success)) +: read-decimal + dup unpack8 lit 0 != 0branch [ 6 8 * , ] ~ TODO character literal minus + ~ This is the case where it's non-negative. + ~ (original string pointer, advanced string pointer) + drop 10 read-base-unsigned exit + + ~ This is the case where it's negative. + ~ (original string pointer, advanced string pointer) + swap drop 10 read-base-unsigned + ~ (result maybe, exit code) + dup 0branch [ 2 8 * , ] + + ~ Failure + ~ (non-zero exit code) + exit + + ~ Success + ~ (result, zero exit code) + swap -1 * swap ; + + +~ Here, we allocate a single machine word's worth of space to use as the +~ backing store of a mutable variable, initialized to zero. Then we define the +~ variable which points to that address. +~ +~ We don't actually need a word header for interpreter-flags-storage, we +~ could just append a zero and point to it directly, but that would make life +~ harder for words that attempt to work with the contents of other words. So +~ we give it a name. + +~ TODO this is the "create" / "here" conflict thing +~ describe-compilation +~ ' interpreter-flags-storage describe +~ ' interpreter-flags describe +~ newline +~ here @ hexdump +~ s" interpreter-flags-storage" stackhex create stackhex ~ make-immediate 0 , +~ ~ latest @ dup unhide-entry s" interpreter-flags" variable +~ describe-compilation +~ ~ here @ hexdump + + +: hide-entry dup entry-flags@ 0x80 | entry-flags! ; + +: unhide-entry dup entry-flags@ 0x80 invert & entry-flags! ; + + +~ TODO the definition of set-word-immediate would come here; is it needed? + +: [ interpreter-flags @ 0x01 invert & interpreter-flags ! ; make-immediate + +: ] interpreter-flags @ 0x01 | interpreter-flags ! ; + + +~ It may seem nonsensical to use : to define :, but the bootstrapping stuff +~ overrides what it does, so it works. The same, of course, goes for all these +~ other word-defining words. +~ +~ If the ] at the end feels backwards, imagine to yourself that everything +~ that ISN'T defining a word body is part of an implicit [ ... ] sequence. +~ Doing so doesn't really change anything, but may make you happier. +: : word value@ create dropstring docol , latest @ hide-entry ] ; + +~ The counterpart of : is ;. +: ; + ~ See commentary on "literal", below, regarding "lit exit". + lit exit , + latest @ unhide-entry + ~ See above regarding [. Since it's an immediate word, we have to go to + ~ extra trouble to compile it as part of ;. + [ ' [ entry-to-execution-token , ] + ; make-immediate + + +~ Although we will eventually define the word "'" to give us the symbol of +~ a word, it will rely on being able to compile a literal. Rather than do lots +~ of string processing later, we choose to define this word now to avoid +~ having to look up the word "lit" as part of that. +~ +~ It may be slightly surprising that the construction "lit lit" works as +~ expected, given that ie. "lit 5" will break, as will "lit [", so it's worth +~ explaining why it does. +~ +~ In most respects "lit" is just an ordinary word, which compilation turns +~ into a pointer to its codeword. That's what happens to most words, if +~ they're not a special syntax nor flagged as immediate. It just happens to be +~ a word that it rarely makes sense to use directly, since its purpose is to +~ be generated as part of the output when compiling number literals. The +~ special behavior around number literals is that when "interpret" sees ie. +~ "5", it first compiles "lit", then appends the numeric value 5 as the +~ following item in the compiled word body. +~ +~ The job of "lit" when it's later executed is to push the appropriate value +~ onto the stack and ensure that it doesn't get executed as code. So, whatever +~ you put immediately after it gets treated as a value, even if it's a +~ pointer. +~ +~ The reason that writing "lit 5" in Evocation syntax crashes is that it +~ gets turned into "lit lit 5" when compiled, which treats the second "lit" as +~ a value then tries to use "5" as a codeword pointer. So you can use "lit" +~ to quote whatever you want, it's just if it's already a special syntax you +~ might need to go behind "interpret"'s back to get it into the compiled +~ output. In practice, this is likely the only place that needs to happen, but +~ the mechanism is documented for the sake of whatever comes up in the future. +~ +~ (value -- ) +: literal lit lit , , ; + + ~ Now the single most important word... -: funterpret +: interpret word ~ If no word was returned, exit. @@ -8,7 +226,7 @@ ~ The string is on the top of the stack, so to get a pointer to it we get ~ the stack address. ~ (string) - value@ dup emitstring newline find + value@ find ~ Check whether the word was found in the dictionary. dup 0 != { @@ -52,8 +270,19 @@ } if ~ If it's neither in the dictionary nor a number, just print an error. - s" No such word: " emitstring value@ emitstring dropstring exit ; + s" No such word: " emitstring value@ emitstring dropstring ; + +~ TODO for ease of debugging, this isn't the full implementation, which lets +~ us exit it to the outer "quit" +: quit { interpret } forever ; -: funquit { funterpret } forever ; +~ quit +~ 4 5 + . : za 13 12 - . ; za +~ : ' word value@ find dropstring-with-result +~ interpreter-flags @ 1 & { literal } if ; make-immediate +~ ' za . newline +~ : piz ' za . newline ; piz +~ ~ ' interpret forget quit 2 3 * . +~ ' ' describe ' za describe ' piz describe +bye -funquit 4 5 + . bye |