implement more of the interpeter, and add some documentation

Force-Push: yes Change-Id: I33ad8783283643ca4977ab19c378156436707687
author: Irene Knapp <ireneista@irenes.space> 2026-05-09 18:32:58 -0700
committer: Irene Knapp <ireneista@irenes.space> 2026-05-09 18:32:58 -0700
commit: 6aded817e2ed13143db15040d88e86d0649f4e85 (patch)
tree: 54b92736c048f30a61f731b5e93d4869dcfc16db /interpret.e
parent: 5481ec020eabce663b5e7423c5e217005df6ad49 (diff)
1 files changed, 234 insertions, 5 deletions
diff --git a/interpret.e b/interpret.e
index 1270429..2bb9e43 100644
--- a/interpret.e
+++ b/interpret.e
@@ -1,5 +1,223 @@
+~ ~~~~~~~~~~~~~~~~~
+~ ~~ Interpreter ~~
+~ ~~~~~~~~~~~~~~~~~
+~
+~   The code in this file defines the basic syntax and semantics of Forth as
+~ a text-based language. It's written in terms of the underlying executor,
+~ which is implemented and explained in evoke.e. The execution model gives us
+~ the concept of "words"; the control and value stacks; and the ability to
+~ call things. It has nothing to say about text, only about the binary form of
+~ the language.
+~
+~   It's traditional in Forth to refer to an act of "compiling" code, which
+~ in this context means turning it from text into its binary representation.
+~ That binary representation most commonly takes the form of a word entry
+~ header followed by an array of codeword pointers.
+~
+~   It would be legitimate to critique the terminology by saying that codeword
+~ pointers are still, in some sense, interpreted: They are not machine code to
+~ be directly executed by the CPU; they rely on "docol" and "next" at runtime.
+~ However, in language design circles, the term "compilation" takes on a
+~ broader meaning, referring to any process which requires some or all of the
+~ types of infrastructure we regard as being compiler internals: A successive
+~ translation of code from one form into another, discarding some types of
+~ information while computing others, in a careful order that results in
+~ logically consistent output which in some sense has the same meaning as the
+~ input. Sometimes this output may be machine code, but often it is another
+~ language meant for human consumption, or an intermediate layer meant to be
+~ fed into another process.
+~
+~   Forth compilation is compilation in this sense, so there is no conflict
+~ and we run with the established terminology. In addition, it must be noted
+~ that Evocation, like many Forths, makes extensive use of words which are
+~ implemented directly in machine code; the Forth execution model allows these
+~ words to co-exist with words that are interpreted by "docol".
+~
+~   At any rate, the code in this file is responsible for that compilation.
+~
+~   It is primarily concerned with managing the contents of an area of memory
+~ we call the "log". Traditional Forth TODO
+
+~ TODO find a better place for this
+: describe-compilation
+  ~ It's always in progress ;) We just need a header like this so it doesn't
+  ~ get confused with other kinds of debug output.
+  ." compilation in progress" newline
+  latest @ hexdump
+  newline
+  ."   here " here @ .hex64 newline
+  ."   latest " latest @ .hex64 newline
+  ."   name of latest: " latest @ entry-to-name emitstring newline
+  newline ;
+
+~ TODO this is identical to the flatassembler version, but it needs to fix the
+~ conflict with s"
+: create
+  here @
+  latest @ pack64
+  0 pack8
+  0 pack8
+  swap packstring
+  8 packalign
+  here @ latest !
+  here ! ;
+
+latest @ describe
+s" foo" create
+latest @ describe
+
+~ create                                                0000001000017fa8
+~ ,                                                     0000001000018080
+~ self-codeword                                         00000010000180d0
+~ variable                                              0000001000018128
+~ allocate                                              00000010000181c8
+~ buffer-physical-start                                 0000001000018240
+~ buffer-physical-length                                0000001000018270
+~ buffer-logical-start                                  00000010000182c0
+~ buffer-logical-length                                 0000001000018308
+~ input-buffer-refill                                   0000001000018350
+~ clear-buffer                                          0000001000018398
+~ zero-input-buffer-metadata                            0000001000018428
+~ allocate-input-buffer-metadata                        0000001000018548
+~ allocate-input-buffer                                 00000010000185b0
+~ attach-string-to-input-buffer                         0000001000018688
+~ main-input-buffer-metadata                            0000001000018738 I raw
+~ main-input-buffer                                     0000001000018788 asm
+~ consume-from                                          00000010000187c0
+~ peek-from                                             0000001000018960
+~ key-from                                              0000001000018ab8
+~ is-space                                              0000001000018b00
+~ peek                                                  0000001000018d20
+~ consume                                               0000001000018d50
+~ key                                                   0000001000018d88
+~ unroll-past-string                                    0000001000018db8
+~ swap-past-string                                      0000001000018ea0
+~ dropstring                                            0000001000018ee8
+~ dropstring-with-result                                0000001000018f80
+~ accumulate-string                                     0000001000018fc8
+~ word                                                  00000010000194a0
+~ find                                                  00000010000195f0
+~ is-alphanumeric                                       0000001000019628
+~ generalized-digit-value                               0000001000019850
+~ decode-generalized-digit                              0000001000019970
+~ read-base-unsigned                                    0000001000019a58
+~ read-integer-unsigned                                 0000001000019cb8
+~ read-integer                                          0000001000019eb0
+
+~ (string pointer
+~  -- result (if successful),
+~     error indicator (zero equals success))
+: read-decimal
+  dup unpack8 lit 0 != 0branch [ 6 8 * , ] ~ TODO character literal minus
+  ~ This is the case where it's non-negative.
+  ~ (original string pointer, advanced string pointer)
+  drop 10 read-base-unsigned exit
+
+  ~ This is the case where it's negative.
+  ~ (original string pointer, advanced string pointer)
+  swap drop 10 read-base-unsigned
+  ~ (result maybe, exit code)
+  dup 0branch [ 2 8 * , ]
+
+  ~ Failure
+  ~ (non-zero exit code)
+  exit
+
+  ~ Success
+  ~ (result, zero exit code)
+  swap -1 * swap ;
+
+
+~   Here, we allocate a single machine word's worth of space to use as the
+~ backing store of a mutable variable, initialized to zero. Then we define the
+~ variable which points to that address.
+~
+~   We don't actually need a word header for interpreter-flags-storage, we
+~ could just append a zero and point to it directly, but that would make life
+~ harder for words that attempt to work with the contents of other words. So
+~ we give it a name.
+
+~ TODO this is the "create" / "here" conflict thing
+~ describe-compilation
+~ ' interpreter-flags-storage describe
+~ ' interpreter-flags describe
+~ newline
+~ here @ hexdump
+~ s" interpreter-flags-storage" stackhex create stackhex ~ make-immediate 0 ,
+~ ~ latest @ dup unhide-entry s" interpreter-flags" variable
+~ describe-compilation
+~ ~ here @ hexdump
+
+
+: hide-entry dup entry-flags@ 0x80 | entry-flags! ;
+
+: unhide-entry dup entry-flags@ 0x80 invert & entry-flags! ;
+
+
+~ TODO the definition of set-word-immediate would come here; is it needed?
+
+: [ interpreter-flags @ 0x01 invert & interpreter-flags ! ; make-immediate
+
+: ] interpreter-flags @ 0x01 | interpreter-flags ! ;
+
+
+~   It may seem nonsensical to use : to define :, but the bootstrapping stuff
+~ overrides what it does, so it works. The same, of course, goes for all these
+~ other word-defining words.
+~
+~   If the ] at the end feels backwards, imagine to yourself that everything
+~ that ISN'T defining a word body is part of an implicit [ ... ] sequence.
+~ Doing so doesn't really change anything, but may make you happier.
+: : word value@ create dropstring docol , latest @ hide-entry ] ;
+
+~   The counterpart of : is ;.
+: ;
+  ~ See commentary on "literal", below, regarding "lit exit".
+  lit exit ,
+  latest @ unhide-entry
+  ~ See above regarding [. Since it's an immediate word, we have to go to
+  ~ extra trouble to compile it as part of ;.
+  [ ' [ entry-to-execution-token , ]
+  ; make-immediate
+
+
+~   Although we will eventually define the word "'" to give us the symbol of
+~ a word, it will rely on being able to compile a literal. Rather than do lots
+~ of string processing later, we choose to define this word now to avoid
+~ having to look up the word "lit" as part of that.
+~
+~   It may be slightly surprising that the construction "lit lit" works as
+~ expected, given that ie. "lit 5" will break, as will "lit [", so it's worth
+~ explaining why it does.
+~
+~   In most respects "lit" is just an ordinary word, which compilation turns
+~ into a pointer to its codeword. That's what happens to most words, if
+~ they're not a special syntax nor flagged as immediate. It just happens to be
+~ a word that it rarely makes sense to use directly, since its purpose is to
+~ be generated as part of the output when compiling number literals. The
+~ special behavior around number literals is that when "interpret" sees ie.
+~ "5", it first compiles "lit", then appends the numeric value 5 as the
+~ following item in the compiled word body.
+~
+~   The job of "lit" when it's later executed is to push the appropriate value
+~ onto the stack and ensure that it doesn't get executed as code. So, whatever
+~ you put immediately after it gets treated as a value, even if it's a
+~ pointer.
+~
+~   The reason that writing "lit 5" in Evocation syntax crashes is that it
+~ gets turned into "lit lit 5" when compiled, which treats the second "lit" as
+~ a value then tries to use "5" as a codeword pointer. So you can use "lit"
+~ to quote whatever you want, it's just if it's already a special syntax you
+~ might need to go behind "interpret"'s back to get it into the compiled
+~ output. In practice, this is likely the only place that needs to happen, but
+~ the mechanism is documented for the sake of whatever comes up in the future.
+~
+~ (value -- )
+: literal lit lit , , ;
+
+
 ~ Now the single most important word...
-: funterpret
+: interpret
   word
 
   ~ If no word was returned, exit.
@@ -8,7 +226,7 @@
   ~ The string is on the top of the stack, so to get a pointer to it we get
   ~ the stack address.
   ~ (string)
-  value@ dup emitstring newline find
+  value@ find
 
   ~ Check whether the word was found in the dictionary.
   dup 0 != {
@@ -52,8 +270,19 @@
   } if
 
   ~ If it's neither in the dictionary nor a number, just print an error.
-  s" No such word: " emitstring value@ emitstring dropstring exit ;
+  s" No such word: " emitstring value@ emitstring dropstring ;
+
+~ TODO for ease of debugging, this isn't the full implementation, which lets
+~ us exit it to the outer "quit"
+: quit { interpret } forever ;
 
-: funquit { funterpret } forever ;
+~ quit
+~ 4 5 + . : za 13 12 - . ; za
+~ : ' word value@ find dropstring-with-result
+~   interpreter-flags @ 1 & { literal } if ; make-immediate
+~ ' za . newline
+~ : piz ' za . newline ; piz
+~ ~ ' interpret forget quit 2 3 * .
+~ ' ' describe ' za describe ' piz describe
+bye
 
-funquit 4 5 + . bye
author	Irene Knapp <ireneista@irenes.space>	2026-05-09 18:32:58 -0700
committer	Irene Knapp <ireneista@irenes.space>	2026-05-09 18:32:58 -0700
commit	6aded817e2ed13143db15040d88e86d0649f4e85 (patch)
tree	54b92736c048f30a61f731b5e93d4869dcfc16db /interpret.e
parent	5481ec020eabce663b5e7423c5e217005df6ad49 (diff)