~ ~~~~~~~~~~~~~~~~~~~~~~~~~
~ ~~ Dynamic definitions ~~
~ ~~~~~~~~~~~~~~~~~~~~~~~~~
~
~   This file provides additional facilities which are fundamental parts of
~ Evocation as a language, but which it's not possible to define until global
~ variables are available. Therefore it is incompatible with the label
~ transform, but compatible with the log-load transform and written to obey
~ its constraints; see transform.e for more details on that.
~
~   The code here relies on the words "log", "s0", "r0", "latest", and "here".
~ These five global variables are the root of all our other data structures.
~ They are defined specially by warm-start in execution.e, since there is no
~ way to create regular definitions for them. Thus, they come to us already
~ set up.
~
~   It may not be obvious, but when a regular docol-based Evocation word is
~ compiled, it hardcodes pointers to all the words it references, which will
~ be part of it forever after. Thus, these words can reference "here" and so
~ on, and they'll just know where to find it, no runtime mechanism for looking
~ it up is needed. That's important, because there are no good ways to build
~ such a mechanism! It would have to dedicate a register or something of that
~ nature, and registers are far too precious for such a use.
~
~   In this file, we define a bunch of more sophisticated ways to work with
~ the log, then we use it to define a high-level flow control facility which
~ saves us from having to compute branch offsets by hand. Pleasantly, we get
~ to use this facility before it's actually defined, since the log-load
~ transform also provides it. That being the case, we might as well start with
~ whatever's most urgent - which is, of course, the debugging tools.


~ Debugging tools for real
~ ~~~~~~~~~~~~~~~~~~~~~~~~

: stack
  s0 @ 8 - { dup value@ 8 + != }
  { dup s0 @ 8 - != { space } if dup @ . 8 - }
  while drop newline ;

: stackhex
  s0 @ 8 - { dup value@ 8 + != }
  { dup s0 @ 8 - != { space } if dup @ .hex64 8 - }
  while drop newline ;

~ (pointer -- boolean)
: is-in-log dup log @ <= swap here @ > && ;

~ (-- entry pointer or 0)
: oldest-entry
  latest @ { dup @ } { @ } while ;

~ (entry pointer -- entry pointer or 0)
: next-newer-entry
  latest @
  2dup = { 2drop 0 exit } if
  { dup { 2dup @ != } if }
  { @ } while swap drop ;

~ (entry pointer -- pointer)
: guess-entry-end
  dup entry-flags@ 64 & 64 = { exit } if
  dup next-newer-entry dup
    { drop dup is-in-log { drop here @ } { drop } if-else
      exit } unless
  swap drop ;

~ (pointer -- entry pointer or 0)
: containing-entry
  dup is-in-log { drop 0 exit } unless
  latest @ { dup { 2dup > } { 0 } if-else }
    { @ } while swap drop ;

~ (entry pointer -- boolean)
: is-assembly-word
  entry-to-execution-token dup 8 + swap @ = ;

~ (entry pointer -- boolean)
: is-docol-itself
  entry-to-name s" docol" stringcmp 0 = ;

~   The word named "docol" has the job of returning the value that gets used
~ as the actual codeword. We make the assumption that the codeword will
~ point somewhere near the entry header; we allow for the possibility that
~ it might be before or after.
~
~   Generally, it's possible for there to be several copies of docol due to
~ alternate logs and things like that, so the goal is to recognize any of
~ them.
~
~ (entry pointer -- boolean)
: is-docol-codeword
  dup is-in-log { drop 0 exit } unless
  containing-entry dup
    { dup is-docol-itself
       { drop 1 }
       { next-newer-entry dup { is-docol-itself } if } if-else
    } if ;


~ TODO this only works on log words
~
~ (entry pointer -- boolean)
: is-docol-interpreted-word
  dup is-assembly-word { drop 0 exit } if
  entry-to-execution-token @ is-docol-codeword ;


~ (pointer -- boolean)
: is-codeword-pointer
  dup is-in-log { drop 0 exit } unless
  dup containing-entry dup { 2drop 0 exit } unless
  entry-to-execution-token = ;


~ (width --)
: indent { dup } { space 1- } while drop ;

~ (entry pointer --)
: word-heading
  dup entry-to-name dup emitstring space
  stringlen 1+ 54 swap - 0 max indent dup .hex64
  dup entry-flags@ dup
    { space
      dup 128 & { s" H" emitstring } if
      dup 64 & { s" M" emitstring } if
      dup 1 & { s" I" emitstring } if
    } if drop
  dup is-assembly-word { s"  asm" emitstring }
    { dup is-docol-interpreted-word { s"  raw" emitstring } unless
    } if-else drop
  newline ;


: list-dictionary
  oldest-entry { dup }
  { dup word-heading next-newer-entry } while drop ;


~ (content end, content start, label start --)
: hexdump-row
  2 indent dup .hex32 dup 4 unroll
  0 { dup 16 > }
  { dup 7 & 0 = { space } if space
    2dup + dup 4 pick <= swap 5 pick > &&
      { 2dup + 8@ .hex8 } { space space } if-else
    1+ } while
  newline 5 ndrop ;


~ (end, start --)
: hexdump-between
  dup 16 1- invert &
  { dup 3 pick > }
  { 3dup hexdump-row 16 + } while 3 ndrop ;


~ (start, length --)
: hexdump-from swap dup 3unroll + swap hexdump-between ;


~ (start --)
: hexdump 64 hexdump-from ;


~ (entry pointer --)
: describe-hex
  dup word-heading
  dup guess-entry-end swap entry-to-execution-token
  hexdump-between ;


~ (entry pointer --)
: describe-docol
  dup word-heading
  dup guess-entry-end swap entry-to-execution-token 8 +
  { 2dup < }
  { space dup @ dup is-codeword-pointer
    { execution-token-to-entry entry-to-name emitstring }
    { . } if-else
    8 + } while newline ;


~ (entry pointer --)
: describe
  dup is-docol-interpreted-word
  { describe-docol } { describe-hex } if-else ;


: describe-all
  oldest-entry { dup }
  { dup describe next-newer-entry } while drop ;


: describe-compilation
  ~ It's always in progress ;) We just need a header like this so it doesn't
  ~ get confused with other kinds of debug output.
  ." compilation in progress" newline
  latest @ hexdump
  newline
  ."   here " here @ .hex64 newline
  ."   latest " latest @ .hex64 newline
  ."   name of latest: " latest @ entry-to-name emitstring newline
  newline ;


: bye 0 sys-exit ;


~ Log manipulation
~ ~~~~~~~~~~~~~~~~

~   In general, we're going to want to be able to go on little excursions
~ where we define utility words that are only useful for one task, then
~ deallocate that stuff after we're done with it. We implement "forget",
~ which removes both dictionary entries and log allocations for the entry
~ pointer it's given and everything that came after.
~
~   The implementation strategy is the same as Jonesforth's version, but
~ Jonesforth runs in immediate mode and reads a word to operate on, whereas
~ ours takes an entry pointer and runs in either compiled or immediate
~ modes.
~
~ (entry pointer --)
: forget dup @ latest ! here ! ;

~ (value --)
: , here @ swap pack64 here ! ;

~   We'll be defining a lot of immediate words, so we should set up a terse
~ way to do that.
: make-immediate latest @ dup entry-flags@ 0x01 | entry-flags! ;
: make-hidden latest @ dup entry-flags@ 0x80 | entry-flags! ;
: make-visible latest @ dup entry-flags@ 0x80 invert & entry-flags! ;

~   Sooner or later we'll want to define recursive words; this one lets us
~ do that. It compiles into a call to the word that's currently being
~ defined (strictly speaking, the one whose definition was most recently
~ begun).
: recurse latest @ entry-to-execution-token , ; make-immediate


~   The implementation of find-in is in log-load.e, for now.
~
~ (string pointer -- entry pointer or 0)
: find latest swap find-in ;


~   Allocates bytes on the log by incrementing the global "here" pointer. The
~ "here" pointer is kept aligned to an 8-byte boundary, regardless of the size
~ requested.
~
~   This does not create dictionary entries, it's just a raw memory interface.
~ It's suitable for allocating data or scratch space.
~
~ (size -- pointer)
: allocate
  here @ dup
  ~ (size, here value, here value)
  3roll + 8 packalign here ! ;


~   Allocate space by incrementing "here", and output a word entry header in
~ it. Also add it to the "latest" linked list. Use zero as the flag values;
~ accept a string pointer on the stack and use its contents as the name.
~
~   This is the first step of creating a new word. Its responsibility includes
~ everything up to the codeword, not including the codeword; it leaves things
~ all set up to start appending contents to the new word by calling ",".
~
~   There's a handy diagram of the entry header format under "quick
~ reference", in the description of the exeuction model in exeuction.e. Create
~ is responsible for everything up to the codeword, not including it.
~
~   When a word is created in interpret mode using s" to provide a string
~ literal, the temporary space that s" uses is in the same place as the
~ entry header we're going to write out. It really is very useful to have
~ that work. Fortunately, it does! We're able to avoid needing a special case
~ by doing things in a very careful way, as described below.
~
~ (string pointer --)
: create
  ~   We add one to the string length in order to include the trailing null
  ~ terminator. This will be the length of our name field; we save an extra
  ~ copy of it to help with packing later.
  dup stringlen 1 + dup 3unroll
  ~ (name field length, string pointer, name field length)

  ~   We use memmove to put the string in its final position, because it works
  ~ correctly when the destination overlaps with the source. Notice that we
  ~ do this before writing anything else in the entry header, to avoid
  ~ stepping on it. The name string always starts ten bytes into the header,
  ~ so we can use a fixed offset.
  here @ 10 + 3unroll memmove
  ~ (name field length)

  ~   Now we can get back to the fields that belong at the start of the entry
  ~ header. We take the value of "here" and keep a working copy of it on the
  ~ stack, which we'll advance every time we write more bytes.
  here @
  ~ (name field length, updated "here" pointer)

  ~   Pack the old value of "latest" as the first field of the header, linking
  ~ from the newly-defined word to the next-newest word.
  ~
  ~   All the entries form a linked list, from newest to oldest. Since the
  ~ link is the first field in the entry header, you can get from each entry
  ~ to the one before it just by dereferencing the entry pointer.
  latest @ pack64

  ~   This is the flags byte. It starts at zero; our caller can change it if
  ~ desired.
  0 pack8

  ~   This is the "other" null terminator, used when traversing the name
  ~ string backwards for execution-token-to-entry. Yes, the name is
  ~ null-terminated at both ends.
  0 pack8

  + ~ The name field is already populated, so just skip past it.
  ~ (updated "here" pointer)

  ~   The codeword is aligned to a machine-word boundary, and the padding for
  ~ it is create's responsibility.
  ~
  ~   By adding the null terminator before adding alignment padding, we've
  ~ made sure there's always at least one null byte. Otherwise we'd be missing
  ~ the terminator if by chance the name were exactly the wrong length.
  8 packalign
  ~ (updated "here" pointer)

  ~   Retrieve the value of "here", which still doesn't reflect our additions,
  ~ and store it at the adddress of "latest". It's the start of our
  ~ newly-defined word, which makes it the latest word.
  here @ latest !

  ~   Finally, we write our updated value of "here" back into the variable.
  here ! ;


~   Notionally, it might make sense to define "create" in terms of
~ "create-in". Any change like that is being postponed to after the removal
~ of flatassembler, when refactorings will be easier.
~
~   The dictionary handle points to a pointer to the first item.
~
~ (string pointer, dictionary handle --)
: create-in
  dup @ here @ swap pack64
~ (name string pointer, dictionary handle, output point)
  0 pack8 0 pack8
  3roll packstring
  8 packalign
~ (dictionary handle, output point)
  swap here @ swap !
~ (output point)
  here !
  ;


: self-codeword here @ 8 + , ;


~   A variable is simply a word that returns a specific address, always the
~ same one, at which a value can be stored. This word "variable" takes and
~ address and a word name, and defines the word. Allocating space is its
~ caller's responsibility.
~
~ TODO the address is constant but the contents vary, confusing, write it up
~
~ (address for new variable word to point to, string pointer --)
: variable
  create
  self-codeword
  here @
  swap :rax mov-reg64-imm64
  :rax push-reg64
  pack-next
  8 packalign
  here ! ;


~   A keyword is a word that evaluates to its own address, which makes it
~ suitable for use as a constant. By convention, all our keywords have names
~ starting with a colon, which imitates the way they work in Common Lisp.
~
~   Specifically, it returns its own execution token. Thus, executing its
~ result repeatedly will keep giving the same value. We aren't in the habit of
~ doing quote-exec kinds of things in Evocation, but it seems as good as any
~ other unique value, so we might as well.
~
~   Unlike CL, we don't currently have the lexer automatically create keywords
~ for us; we create them explicitly. That's likely to be added at some point,
~ but at the moment the feature is lying fallow to see whether it winds up
~ seeing a lot of use.
~
~ (string pointer --)
: keyword
  create

  ~   Before outputting our codeword, save a copy of the address where it's
  ~ going to be. That will be the execution token we return.
  here @ dup
  ~ (self execution token, output point)

  ~   Now add a codeword. This is an assembly word, so it's a self-codeword,
  ~ meaning it points to the word right after itself.
  dup 8 + pack64
  ~ (self execution token, output point)

  ~ Now we consume the execution token, using it as part of this instruction.
  :rax mov-reg64-imm64
  ~ (output point)

  ~ To return it, we push it to the stack.
  :rax push-reg64

  ~ Now just the normal stuff every assembly word ends with.
  pack-next
  8 packalign

  here ! ;


~   Although we will eventually define the word "'" to give us the symbol of
~ a word, it will rely on being able to compile a literal. Rather than do lots
~ of string processing later, we choose to define this word now to avoid
~ having to look up the word "lit" as part of that.
~
~   It may be slightly surprising that the construction "lit lit" works as
~ expected, given that ie. "lit 5" will break, as will "lit [", so it's worth
~ explaining why it does.
~
~   In most respects "lit" is just an ordinary word, which compilation turns
~ into a pointer to its codeword. That's what happens to most words, if
~ they're not a special syntax nor flagged as immediate. It just happens to be
~ a word that it rarely makes sense to use directly, since its purpose is to
~ be generated as part of the output when compiling number literals. The
~ special behavior around number literals is that when "interpret" sees ie.
~ "5", it first compiles "lit", then appends the numeric value 5 as the
~ following item in the compiled word body.
~
~   The job of "lit" when it's later executed is to push the appropriate value
~ onto the stack and ensure that it doesn't get executed as code. So, whatever
~ you put immediately after it gets treated as a value, even if it's a
~ pointer.
~
~   The reason that writing "lit 5" in Evocation syntax crashes is that it
~ gets turned into "lit lit 5" when compiled, which treats the second "lit" as
~ a value then tries to use "5" as a codeword pointer. So you can use "lit"
~ to quote whatever you want, it's just if it's already a special syntax you
~ might need to go behind "interpret"'s back to get it into the compiled
~ output. In practice, this is likely the only place that needs to happen, but
~ the mechanism is documented for the sake of whatever comes up in the future.
~
~ (value -- )
: literal lit lit , , ;