~ ~~~~~~~~~~~~~~~~~
~ ~~ Interpreter ~~
~ ~~~~~~~~~~~~~~~~~
~
~   The code in this file defines the basic syntax and semantics of Forth as
~ a text-based language. It's written in terms of the underlying executor,
~ which is implemented and explained in evoke.e. The execution model gives us
~ the concept of "words"; the control and value stacks; and the ability to
~ call things. It has nothing to say about text, only about the binary form of
~ the language.
~
~   It's traditional in Forth to refer to an act of "compiling" code, which
~ in this context means turning it from text into its binary representation.
~ That binary representation most commonly takes the form of a word entry
~ header followed by an array of codeword pointers.
~
~   It would be legitimate to critique the terminology by saying that codeword
~ pointers are still, in some sense, interpreted: They are not machine code to
~ be directly executed by the CPU; they rely on "docol" and "next" at runtime.
~ However, in language design circles, the term "compilation" takes on a
~ broader meaning, referring to any process which requires some or all of the
~ types of infrastructure we regard as being compiler internals: A successive
~ translation of code from one form into another, discarding some types of
~ information while computing others, in a careful order that results in
~ logically consistent output which in some sense has the same meaning as the
~ input. Sometimes this output may be machine code, but often it is another
~ language meant for human consumption, or an intermediate layer meant to be
~ fed into another process.
~
~   Forth compilation is compilation in this sense, so there is no conflict
~ and we run with the established terminology. In addition, it must be noted
~ that Evocation, like many Forths, makes extensive use of words which are
~ implemented directly in machine code; the Forth execution model allows these
~ words to co-exist with words that are interpreted by "docol".
~
~   At any rate, the code in this file is responsible for that compilation.
~
~   It is primarily concerned with managing the contents of an area of memory
~ we call the "log". Traditional Forth TODO

~ TODO find a better place for this
: describe-compilation
  ~ It's always in progress ;) We just need a header like this so it doesn't
  ~ get confused with other kinds of debug output.
  ." compilation in progress" newline
  latest @ hexdump
  newline
  ."   here " here @ .hex64 newline
  ."   latest " latest @ .hex64 newline
  ."   name of latest: " latest @ entry-to-name emitstring newline
  newline ;

~   Allocate space by incrementing "here", and output a word entry header in
~ it. Also add it to the "latest" linked list. Use zero as the flag values;
~ accept a string pointer on the stack and use its contents as the name.
~
~   This is the first step of creating a new word. Its responsibility includes
~ everything up to the codeword, not including the codeword; it leaves things
~ all set up to start appending contents to the new word by calling ",".
~
~   There's a handy diagram of the entry header format under "quick
~ reference", in the description of the exeuction model in evoke.e. Create is
~ responsible for everything up to the codeword, not including it.
~
~   When a word is created in interpret mode using s" to provide a string
~ literal, the temporary space that s" uses is in the same place as the
~ entry header we're going to write out. It really is very useful to have
~ that work. Fortunately, it does! We're able to avoid needing a special case
~ by doing things in a very careful way, as described below.
~
~ (string pointer --)
: create
  ~   We add one to the string length in order to include the trailing null
  ~ terminator. This will be the length of our name field; we save an extra
  ~ copy of it to help with packing later.
  dup stringlen 1 + dup 3unroll
  ~ (name field length, string pointer, name field length)

  ~   We use memmove to put the string in its final position, because it works
  ~ correctly when the destination overlaps with the source. Notice that we
  ~ do this before writing anything else in the entry header, to avoid
  ~ stepping on it. The name string always starts ten bytes into the header,
  ~ so we can use a fixed offset.
  here @ 10 + 3unroll memmove
  ~ (name field length)

  ~   Now we can get back to the fields that belong at the start of the entry
  ~ header. We take the value of "here" and keep a working copy of it on the
  ~ stack, which we'll advance every time we write more bytes.
  here @
  ~ (name field length, updated "here" pointer)

  ~   Pack the old value of "latest" as the first field of the header, linking
  ~ from the newly-defined word to the next-newest word.
  ~
  ~   All the entries form a linked list, from newest to oldest. Since the
  ~ link is the first field in the entry header, you can get from each entry
  ~ to the one before it just by dereferencing the entry pointer.
  latest @ pack64

  ~   This is the flags byte. It starts at zero; our caller can change it if
  ~ desired.
  0 pack8

  ~   This is the "other" null terminator, used when traversing the name
  ~ string backwards for execution-token-to-entry. Yes, the name is
  ~ null-terminated at both ends.
  0 pack8

  + ~ The name field is already populated, so just skip past it.
  ~ (updated "here" pointer)

  ~   The codeword is aligned to a machine-word boundary, and the padding for
  ~ it is create's responsibility.
  ~
  ~   By adding the null terminator before adding alignment padding, we've
  ~ made sure there's always at least one null byte. Otherwise we'd be missing
  ~ the terminator if by chance the name were exactly the wrong length.
  8 packalign
  ~ (updated "here" pointer)

  ~   Retrieve the value of "here", which still doesn't reflect our additions,
  ~ and store it at the adddress of "latest". It's the start of our
  ~ newly-defined word, which makes it the latest word.
  here @ latest !

  ~   Finally, we write our updated value of "here" back into the variable.
  here ! ;

~ ,                                                     0000001000018080
~ self-codeword                                         00000010000180d0
~ variable                                              0000001000018128
~ allocate                                              00000010000181c8
~ buffer-physical-start                                 0000001000018240
~ buffer-physical-length                                0000001000018270
~ buffer-logical-start                                  00000010000182c0
~ buffer-logical-length                                 0000001000018308
~ input-buffer-refill                                   0000001000018350
~ clear-buffer                                          0000001000018398
~ zero-input-buffer-metadata                            0000001000018428
~ allocate-input-buffer-metadata                        0000001000018548
~ allocate-input-buffer                                 00000010000185b0
~ attach-string-to-input-buffer                         0000001000018688
~ main-input-buffer-metadata                            0000001000018738 I raw
~ main-input-buffer                                     0000001000018788 asm
~ consume-from                                          00000010000187c0
~ peek-from                                             0000001000018960
~ key-from                                              0000001000018ab8
~ is-space                                              0000001000018b00
~ peek                                                  0000001000018d20
~ consume                                               0000001000018d50
~ key                                                   0000001000018d88
~ unroll-past-string                                    0000001000018db8
~ swap-past-string                                      0000001000018ea0
~ dropstring                                            0000001000018ee8
~ dropstring-with-result                                0000001000018f80
~ accumulate-string                                     0000001000018fc8
~ word                                                  00000010000194a0
~ find                                                  00000010000195f0
~ is-alphanumeric                                       0000001000019628
~ generalized-digit-value                               0000001000019850
~ decode-generalized-digit                              0000001000019970
~ read-base-unsigned                                    0000001000019a58
~ read-integer-unsigned                                 0000001000019cb8
~ read-integer                                          0000001000019eb0

~ (string pointer
~  -- result (if successful),
~     error indicator (zero equals success))
: read-decimal
  dup unpack8 lit 0 != 0branch [ 6 8 * , ] ~ TODO character literal minus
  ~ This is the case where it's non-negative.
  ~ (original string pointer, advanced string pointer)
  drop 10 read-base-unsigned exit

  ~ This is the case where it's negative.
  ~ (original string pointer, advanced string pointer)
  swap drop 10 read-base-unsigned
  ~ (result maybe, exit code)
  dup 0branch [ 2 8 * , ]

  ~ Failure
  ~ (non-zero exit code)
  exit

  ~ Success
  ~ (result, zero exit code)
  swap -1 * swap ;


~   Here, we allocate a single machine word's worth of space to use as the
~ backing store of a mutable variable, initialized to zero. Then we define the
~ variable which points to that address.
~
~   We don't actually need a word header for interpreter-flags-storage, we
~ could just append a zero and point to it directly, but that would make life
~ harder for words that attempt to work with the contents of other words. So
~ we give it a name.

~ TODO this is the "create" / "here" conflict thing
~ describe-compilation
~ ' interpreter-flags-storage describe
~ ' interpreter-flags describe
~ newline
~ here @ hexdump
~ s" interpreter-flags-storage" stackhex create stackhex ~ make-immediate 0 ,
~ ~ latest @ dup unhide-entry s" interpreter-flags" variable
~ describe-compilation
~ ~ here @ hexdump


: hide-entry dup entry-flags@ 0x80 | entry-flags! ;

: unhide-entry dup entry-flags@ 0x80 invert & entry-flags! ;


~ TODO the definition of set-word-immediate would come here; is it needed?

: [ interpreter-flags @ 0x01 invert & interpreter-flags ! ; make-immediate

: ] interpreter-flags @ 0x01 | interpreter-flags ! ;


~   It may seem nonsensical to use : to define :, but the bootstrapping stuff
~ overrides what it does, so it works. The same, of course, goes for all these
~ other word-defining words.
~
~   If the ] at the end feels backwards, imagine to yourself that everything
~ that ISN'T defining a word body is part of an implicit [ ... ] sequence.
~ Doing so doesn't really change anything, but may make you happier.
: : word value@ create dropstring docol , latest @ hide-entry ] ;

~   The counterpart of : is ;.
: ;
  ~ See commentary on "literal", below, regarding "lit exit".
  lit exit ,
  latest @ unhide-entry
  ~ See above regarding [. Since it's an immediate word, we have to go to
  ~ extra trouble to compile it as part of ;.
  [ ' [ entry-to-execution-token , ]
  ; make-immediate


~   Although we will eventually define the word "'" to give us the symbol of
~ a word, it will rely on being able to compile a literal. Rather than do lots
~ of string processing later, we choose to define this word now to avoid
~ having to look up the word "lit" as part of that.
~
~   It may be slightly surprising that the construction "lit lit" works as
~ expected, given that ie. "lit 5" will break, as will "lit [", so it's worth
~ explaining why it does.
~
~   In most respects "lit" is just an ordinary word, which compilation turns
~ into a pointer to its codeword. That's what happens to most words, if
~ they're not a special syntax nor flagged as immediate. It just happens to be
~ a word that it rarely makes sense to use directly, since its purpose is to
~ be generated as part of the output when compiling number literals. The
~ special behavior around number literals is that when "interpret" sees ie.
~ "5", it first compiles "lit", then appends the numeric value 5 as the
~ following item in the compiled word body.
~
~   The job of "lit" when it's later executed is to push the appropriate value
~ onto the stack and ensure that it doesn't get executed as code. So, whatever
~ you put immediately after it gets treated as a value, even if it's a
~ pointer.
~
~   The reason that writing "lit 5" in Evocation syntax crashes is that it
~ gets turned into "lit lit 5" when compiled, which treats the second "lit" as
~ a value then tries to use "5" as a codeword pointer. So you can use "lit"
~ to quote whatever you want, it's just if it's already a special syntax you
~ might need to go behind "interpret"'s back to get it into the compiled
~ output. In practice, this is likely the only place that needs to happen, but
~ the mechanism is documented for the sake of whatever comes up in the future.
~
~ (value -- )
: literal lit lit , , ;


~ Now the single most important word...
: interpret
  word

  ~ If no word was returned, exit.
  dup 0 = { drop exit } if

  ~ The string is on the top of the stack, so to get a pointer to it we get
  ~ the stack address.
  ~ (string)
  value@ find

  ~ Check whether the word was found in the dictionary.
  dup 0 != {
    ~ If the word is in the dictionary, check what mode we're in, then...
    dropstring-with-result
    ~ (entry pointer)
    interpreter-flags @ 0x01 & {
      ~ ... if we're in compile mode, there's still a chance it's an immediate
      ~ word, in which case we fall through to interpret mode...
      dup entry-flags@ 1 & 0 =

      ~ ... but it's a regular word, so append it to the heap.
      { entry-to-execution-token , exit } if
    } if

    ~ ... if we're in interpret mode, or the word is immediate, run it.
    entry-to-execution-token execute exit
  } if

  ~ If it's not in the dictionary, check whether it's a decimal number.
  drop
  ~ As before, we get the stack address and use it as a string pointer.
  ~ (string)
  value@ read-integer 0 = {
    ~ It's a number.
    interpreter-flags @ 0x01 & {
      ~ We're in compile mode; append first "lit", then the number, to the
      ~ heap. The version of "lit" we use is the one that's current when we
      ~ ourselves are compiled, hardcoded; doing a dynamic lookup would
      ~ require dealing with what happens if it's not found.
      dropstring-with-result
      [ ' lit entry-to-execution-token literal ]
      , ,
      exit
    } if

    ~ We're in interpret mode; push the number to the stack. Or at least, that's
    ~ what the code we're interpreting will see. Really it's already on the
    ~ stack, just clean everything else up and leave it there.
    dropstring-with-result exit
  } if

  ~ If it's neither in the dictionary nor a number, just print an error.
  s" No such word: " emitstring value@ emitstring dropstring ;

~ TODO for ease of debugging, this isn't the full implementation, which lets
~ us exit it to the outer "quit"
: quit { interpret } forever ;

~ quit
~ 4 5 + . : za 13 12 - . ; za
~ : ' word value@ find dropstring-with-result
~   interpreter-flags @ 1 & { literal } if ; make-immediate
~ ' za . newline
~ : piz ' za . newline ; piz
~ ~ ' interpret forget quit 2 3 * .
~ ' ' describe ' za describe ' piz describe
bye