~ ~~~~~~~~~~~~~~~~~
~ ~~ Interpreter ~~
~ ~~~~~~~~~~~~~~~~~
~
~   The code in this file defines the basic syntax and semantics of Forth as
~ a text-based language. It's written in terms of the underlying executor,
~ which is implemented and explained in execution.e. The execution model gives
~ us the concept of "words"; the control and value stacks; and the ability to
~ call things. It has nothing to say about text, only about the binary form of
~ the language.
~
~   It's traditional in Forth to refer to an act of "compiling" code, which
~ in this context means turning it from text into its binary representation.
~ That binary representation most commonly takes the form of a word entry
~ header followed by an array of codeword pointers.
~
~   It would be legitimate to critique the terminology by saying that codeword
~ pointers are still, in some sense, interpreted: They are not machine code to
~ be directly executed by the CPU; they rely on "docol" and "next" at runtime.
~ However, in language design circles, the term "compilation" takes on a
~ broader meaning, referring to any process which requires some or all of the
~ types of infrastructure we regard as being compiler internals: A successive
~ translation of code from one form into another, discarding some types of
~ information while computing others, in a careful order that results in
~ logically consistent output which in some sense has the same meaning as the
~ input. Sometimes this output may be machine code, but often it is another
~ language meant for human consumption, or an intermediate layer meant to be
~ fed into another process.
~
~   Forth compilation is compilation in this sense, so there is no conflict
~ and we run with the established terminology. In addition, it must be noted
~ that Evocation, like many Forths, makes extensive use of words which are
~ implemented directly in machine code; the Forth execution model allows these
~ words to co-exist with words that are interpreted by "docol".
~
~   At any rate, the code in this file is responsible for that compilation.
~
~   It is primarily concerned with managing the contents of an area of memory
~ we call the "log"; see log-load.e for more detail on terminology.

: hide-entry dup entry-flags@ 0x80 | entry-flags! ;

: unhide-entry dup entry-flags@ 0x80 invert & entry-flags! ;


~ TODO
~ unroll-past-string                                    0000001000018db8
~ swap-past-string                                      0000001000018ea0
~ dropstring                                            0000001000018ee8
~ dropstring-with-result                                0000001000018f80
~ accumulate-string                                     0000001000018fc8
~ is-space                                              0000001000018b00
~ word                                                  00000010000194a0

~   The word "'", often pronounced "tick", quotes the following word, looking
~ it up and treating it as a constant. In immediate mode, the constant winds
~ up on the stack; in compile mode it gets compiled.
~
~   There are a few possible implementation strategies here. Running as an
~ immediate word means there's a clear and unambiguous concept of "the
~ following word", so that's what we do; otherwise we'd have to get clever
~ about somehow finding out where we were called from. That means we take on
~ what would otherwise be the interpreter's responsibility, of checking what
~ mode we're in. Happily, that's easy to do.
~
~   There's a cyclic dependency where "if" relies on "'", and "'" relies on
~ "if". Fortunately both of them are treated as alternates by the log-load
~ transform, so we don't have to worry about it.
: ' word value@ find dropstring-with-result
  interpreter-flags @ 1 & { literal } if
  ; make-immediate


~ (character -- 1 for true or 0 for false)
: is-alphanumeric
  ~ We don't have a character-literal syntax; the hex constants here are
  ~ ASCII codes.
  dup 0x30 > { drop 0 exit } if ~ Less than "0".
  dup 0x39 >= { drop 1 exit } if ~ Less than or equal to "9".
  dup 0x41 > { drop 0 exit } if ~ Less than "A".
  dup 0x5a >= { drop 1 exit } if ~ Less than or equal to "Z".
  dup 0x61 > { drop 0 exit } if ~ Less than "a".
  dup 0x7a >= { drop 1 exit } if ~ Less than or equal to "z".
  drop 0 ; ~ Greater than "z".


~ (character -- value)
: generalized-digit-value
  ~ We don't have a character-literal syntax; the hex constants here are
  ~ ASCII codes.
  dup 0x61 <= { 0x61 - 10 + exit } if ~ lowercase "a"
  dup 0x41 <= { 0x41 - 10 + exit } if ~ uppercase "a"
  0x30 - ; ~ digit "0"


~ (character, base
~  -- value (if successful),
~     error indicator (zero equals success))
: decode-generalized-digit
  swap dup is-alphanumeric {
    ~ It's alphanumeric.
    ~ (base, character)
    generalized-digit-value
    ~ (base, value)
    dup 3roll
    ~ (value, value, base)
    > {
      ~ It's in range.
      ~ (value)
      0 exit } if
    ~ It's out of range.
    ~ (value)
    drop 1 exit } if
  ~ It's not alphanumeric.
  drop drop 1 ;


~ (string pointer, base
~  -- result (if successful),
~     error indicator (zero equals success))
: read-base-unsigned
  swap

  ~ If the first byte is null, this is an error
  unpack8
  ~ (numeric base, current point in string, character)
  dup 0 = { drop drop drop 1 exit } if

  ~ Decode the first byte as a generalized digit in the base.
  ~ (numeric base, current point in string, character)
  ~ If the first byte is less than "0", this is an error.
  3roll dup 4 unroll
  ~ (numeric base, current point in string, character, numeric base)
  decode-generalized-digit {
    ~ (numeric base, current point in string)
    drop drop 1 exit } if

  ~ The first byte is a valid generalized digit in the appropriate base, so
  ~ let's get started.
  ~ (numeric base, current point in string, initial value)
  swap

  {
    ~ (numeric base, result so far, current point in string)
    unpack8 dup 0 = {
      ~ A null after the first character is valid, and indicates we're done.
      drop drop swap drop 0 exit } if

    ~ Decode the latest byte as a generalized digit in the base.
    ~ (numeric base, result so far, current point in string, latest byte)
    4 roll dup 5 unroll
    ~ (numeric base, result so far, current point in string, character
    ~  numeric base)
    decode-generalized-digit {
      ~ If the latest character is not a valid digit, that's an error.
      ~ (numeric base, result so far, current point in string)
      drop drop drop 1 exit } if

    ~ The latest character is valid, so incorporate it and loop.
    ~ (numeric base, result so far, current point in string, latest value)
    3roll 4 roll dup 5 unroll * + swap
  } forever ;


~ (string pointer
~  -- result (if successful),
~     error indicator (zero equals success))
: read-integer-unsigned
  ~ We don't have a character-literal syntax; the hex constants here are
  ~ ASCII codes.
  dup unpack8 0x30 != { ~ digit "0"
    ~ This is the case where the leading digit is not a zero.
    ~ (original string pointer, advanced string pointer)
    drop 10 read-base-unsigned exit } if

  ~ This is the case where the leading digit is a zero.
  ~ (original string pointer, advanced string pointer)
  unpack8 dup 0x78 = { ~ lowercase "x"
    ~ (original string pointer, doubly advanced string pointer, character)
    drop swap drop 16 read-base-unsigned exit } if

  dup 0x6f = { ~ lowercase "o"
    ~ (original string pointer, doubly advanced string pointer, character)
    drop swap drop 8 read-base-unsigned exit } if

  dup 0x62 = { ~ lowercase "b"
    ~ (original string pointer, doubly advanced string pointer, character)
    swap drop swap 2 read-base-unsigned exit } if

  ~ This is the case where the second character is something else.
  ~ (original string pointer, doubly advanced string pointer, character)
  drop drop 10 read-base-unsigned ;


~ (string pointer
~  -- result (if successful),
~     error indicator (zero equals success))
: read-integer
  ~ We don't have a character-literal syntax; this is ASCII for a hyphen.
  dup unpack8 0x2d != {
    ~ This is the case where it's non-negative.
    ~ (original string pointer, advanced string pointer)
    drop read-integer-unsigned exit
  } if

  ~ This is the case where it's negative.
  ~ (original string pointer, advanced string pointer)
  swap drop read-integer-unsigned
  ~ (result maybe, exit code)
  dup {
    ~ Failure
    ~ (non-zero exit code)
    exit
  } if

  ~ Success
  ~ (result, zero exit code)
  swap -1 * swap ;


~ (string pointer
~  -- result (if successful),
~     error indicator (zero equals success))
: read-decimal
  ~ We don't have a character-literal syntax; this is ASCII for a hyphen.
  dup unpack8 0x2d != {
    ~ This is the case where it's non-negative.
    ~ (original string pointer, advanced string pointer)
    drop 10 read-base-unsigned exit
  } if

  ~ This is the case where it's negative.
  ~ (original string pointer, advanced string pointer)
  swap drop 10 read-base-unsigned
  ~ (result maybe, exit code)
  dup {
    ~ Failure
    ~ (non-zero exit code)
    exit
  } if

  ~ Success
  ~ (result, zero exit code)
  swap -1 * swap ;

~   Here, we allocate a single machine word's worth of space to use as the
~ backing store of a mutable variable, initialized to zero. Then we define the
~ variable which points to that address.
~
~   We don't actually need a word header for interpreter-flags-storage, we
~ could just append a zero and point to it directly, but that would make life
~ harder for words that attempt to work with the contents of other words. So
~ we give it a name.

s" interpreter-flags-storage" create make-immediate make-visible
here @ 0 , s" interpreter-flags" variable

~   There's an important bootstrapping concern: If you're loading this
~ interpreter into a running Evocation, it's important to not use the wrong
~ interpreter state value. TODO longer explanation

~ TODO the definition of set-word-immediate would come here; is it needed?

: [ interpreter-flags @ 0x01 invert & interpreter-flags ! ; make-immediate
latest @ dup hide-entry

: ] interpreter-flags @ 0x01 | interpreter-flags ! ;
latest @ dup hide-entry
~ (pointer to [ entry, pointer to ] entry)

~   It may seem nonsensical to use : to define :, but the bootstrapping stuff
~ overrides what it does, so it works. The same, of course, goes for all these
~ other word-defining words.
~
~   If the ] at the end feels backwards, imagine to yourself that everything
~ that ISN'T defining a word body is part of an implicit [ ... ] sequence.
~ Doing so doesn't really change anything, but may make you happier.
: : word value@ create dropstring docol , latest @ hide-entry ] ;

~   The counterpart of : is ;.
: ;
  ~ See commentary on "literal", below, regarding "lit exit".
  lit exit ,
  latest @ unhide-entry
  ~ See above regarding [. Since it's an immediate word, we have to go to
  ~ extra trouble to compile it as part of ;. Since it's also hidden, we have
  ~ to go behind the interpreter's back to even get its entry pointer.
  [ over entry-to-execution-token , ]
  ; make-immediate
latest @ dup hide-entry
~ (pointer to [ entry, pointer to ] entry, pointer to ; entry)


: ;asm
  here @ pack-next 8 packalign here !
  latest @ dup unhide-entry entry-to-execution-token dup 8 + swap !
  ~ See above.
  [ 2 pick entry-to-execution-token , ]
  ; make-immediate
latest @ dup hide-entry
~ (pointer to [ entry, pointer to ], pointer to ;, pointer to ;asm)


~   Although we will eventually define the word "'" to give us the symbol of
~ a word, it will rely on being able to compile a literal. Rather than do lots
~ of string processing later, we choose to define this word now to avoid
~ having to look up the word "lit" as part of that.
~
~   It may be slightly surprising that the construction "lit lit" works as
~ expected, given that ie. "lit 5" will break, as will "lit [", so it's worth
~ explaining why it does.
~
~   In most respects "lit" is just an ordinary word, which compilation turns
~ into a pointer to its codeword. That's what happens to most words, if
~ they're not a special syntax nor flagged as immediate. It just happens to be
~ a word that it rarely makes sense to use directly, since its purpose is to
~ be generated as part of the output when compiling number literals. The
~ special behavior around number literals is that when "interpret" sees ie.
~ "5", it first compiles "lit", then appends the numeric value 5 as the
~ following item in the compiled word body.
~
~   The job of "lit" when it's later executed is to push the appropriate value
~ onto the stack and ensure that it doesn't get executed as code. So, whatever
~ you put immediately after it gets treated as a value, even if it's a
~ pointer.
~
~   The reason that writing "lit 5" in Evocation syntax crashes is that it
~ gets turned into "lit lit 5" when compiled, which treats the second "lit" as
~ a value then tries to use "5" as a codeword pointer. So you can use "lit"
~ to quote whatever you want, it's just if it's already a special syntax you
~ might need to go behind "interpret"'s back to get it into the compiled
~ output. In practice, this is likely the only place that needs to happen, but
~ the mechanism is documented for the sake of whatever comes up in the future.
~
~ (value -- )
: literal lit lit , , ;


~ Now the single most important word...
: interpret
  word

  ~ If no word was returned, exit.
  dup 0 = { drop exit } if

  ~ The string is on the top of the stack, so to get a pointer to it we get
  ~ the stack address.
  ~ (string)
  value@ find

  ~ Check whether the word was found in the dictionary.
  dup 0 != {
    ~ If the word is in the dictionary, check what mode we're in, then...
    dropstring-with-result
    ~ (entry pointer)
    interpreter-flags @ 0x01 & {
      ~ ... if we're in compile mode, there's still a chance it's an immediate
      ~ word, in which case we fall through to interpret mode...
      dup entry-flags@ 1 & 0 =

      ~ ... but it's a regular word, so append it to the heap.
      { entry-to-execution-token , exit } if
    } if

    ~ ... if we're in interpret mode, or the word is immediate, run it.
    entry-to-execution-token execute exit
  } if

  ~ If it's not in the dictionary, check whether it's an integer literal.
  drop
  ~ As before, we get the stack address and use it as a string pointer.
  ~ (string)
  value@ read-integer 0 = {
    ~ It's a number.
    interpreter-flags @ 0x01 & {
      ~ We're in compile mode; append first "lit", then the number, to the
      ~ heap. The version of "lit" we use is the one that's current when we
      ~ ourselves are compiled, hardcoded; doing a dynamic lookup would
      ~ require dealing with what happens if it's not found.
      dropstring-with-result
      [ ' lit entry-to-execution-token literal ]
      , ,
      exit
    } if

    ~ We're in interpret mode; push the number to the stack. Or at least, that's
    ~ what the code we're interpreting will see. Really it's already on the
    ~ stack, just clean everything else up and leave it there.
    dropstring-with-result exit
  } if

  ~ If it's neither in the dictionary nor a number, just print an error.
  s" No such word: " emitstring value@ emitstring dropstring ;

~ TODO for ease of debugging, this isn't the full implementation, which lets
~ us exit it to the outer "quit"
: quit { interpret } forever ;

~   Now we switch into the new interpreter, enabling the three words we'd been
~ keeping hidden and then calling "quit".
unhide-entry unhide-entry unhide-entry quit

-0x10 newline . newline
4 5 + . : za 13 12 - . ; za
~ : ' word value@ find dropstring-with-result
~  interpreter-flags @ 1 & { literal } if ; make-immediate
' za . newline
: piz ' za . newline ; piz
~ ' interpret forget quit 2 3 * .
' ' describe ' za describe ' piz describe
bye