~ (buffer size -- buffer address)
: read-to-buffer
  dup allocate dup dup
  ~ (buffer size, buffer address, word start, output point)
  { key
    ~ Exit if it's a zero byte.
    dup not {
      ~ Make sure to pack the zero to serve as a null terminator.
      pack8
      drop drop swap drop exit } if
    dup is-space
      { ~ (buffer size, buffer address, word start, output point, key)
        ~ Tuck the key out of the way until we've done some stuff.
        3unroll
        ~ If it's a space character, first check if we just consumed the magic
        ~ word...
        2dup swap - 8 = dup {
          drop
          ~ Add a null terminator so we can use stringcmp
          dup 0 swap !
          ~ Check for the magic word
          over s" pyrzqxgl" stringcmp 0 =
          } if
        { ~ It's magic, so exit.
          ~ Make sure to pack a zero to serve as a null terminator.
          0 pack8
          drop drop drop swap drop exit }
        { ~ It's not magic, so reset the word start. Of course whitespace is
          ~ not a word but this will help us keep track of things.
          3roll pack8
          swap drop dup } if-else }
      { ~ (buffer size, buffer address, word start, output point, key)
        ~ Tuck the key out of the way again.
        3unroll
        ~ Check if the word just started and the previous character is space.
        2dup = dup { drop dup @ is-space } if
          { ~ If so, this is the actual first character of the word.
            drop swap pack8 dup }
          { ~ If not, leave the word start alone.
            3roll pack8 } if-else } if-else } forever ;

~   In logical terms, this modifies an input buffer metadata structure
~ in-place to push a new, zeroed one into the start of the linked list formed
~ through the next-source field.
~
~   In physical terms, it works by allocating a new structure, copying the
~ fields of the existing one into it, and zeroing the existing one. That's
~ necessary because otherwise we'd need a mutable handle (a pointer to a
~ pointer) to update the start of the list, and there's no way to do that with
~ the main-input-buffer variable working the way it presently does.
~
~ (input buffer metadata pointer --)
: push-input-buffer
  allocate-input-buffer-metadata
  ~ (original metadata pointer, new metadata pointer)
  2dup swap 6 8 * memcopy
  ~ (original metadata pointer, new metadata pointer)
  swap dup zero-input-buffer-metadata
  input-buffer-next-source !
  ;

~   This does the inverse of push-input-buffer. In the event that the
~ next-source field is null, it zeroes the buffer.
~
~   Note, however, that it doesn't deallocate the memory, because that's not
~ how memory allocation on the log works. If necessary, it can be deallocated
~ with "forget", though as usual that requires careful planning.
~
~ (input buffer metadata pointer --)
: pop-input-buffer
  dup input-buffer-next-source @
  ~ (original metadata pointer, next source metadata pointer)
  dup { 6 8 * memcopy }
      { drop zero-input-buffer-metadata } if-else ;

: L:
  ' L' entry-to-execution-token execute
  { ' set-label entry-to-execution-token , }
  { set-label } if-else
  ; make-immediate
  ~ TODO probably needs to do more

: transform-state-saved-here ;
: transform-state-saved-latest 8 + ;
: allocate-transform-state
  2 8 * allocate
  dup transform-state-saved-here 0 swap !
  dup transform-state-saved-latest 0 swap ! ;
allocate-transform-state s" transform-state" variable

~   The label transform operates on code that compiles itself, and ensures
~ that the result of the compilation is suitable to be included in an
~ executable binary. To achieve this, it makes several changes to the
~ semantics of that code. The transform relies on the label facility, and
~ expects to run from within label-loop.
~
~   The most fundamental change is that the label transform separates words
~ that run in compile mode from words that run immediately.  There is no
~ distinction made between words running in immediate mode, and words declared
~ as immediate. Immediate words are looked up and executed based on their
~ "real", currently-executing definitions. Compiled words, including
~ literals, are looked up via the label facility.
~
~   Since the label facility is able to resolve forward references, there is
~ no hard requirement that words be topologically sorted, but forward
~ references should still be kept to a minimum, since that's a significant
~ difference from un-transformed code that could easily become confusing.
~
~   Compilation words do make extensive reference to the global variables
~ "here" and "latest". In particular, flow-control words such as if-else
~ expect the log to have recent compilation outputs on it, and to be able to
~ mutate them in-place. In order to make this work, we provide temporary
~ values of these two variables which point to the location of the output
~ buffer. This allows pointer resolution to work correctly without additional
~ effort, but notice that the buffer's address will differ from the address
~ the resulting program loads itself at. There's no simple way to avoid this
~ concern, since the variables must point to one of those addresses or the
~ other, not both.
~
~   We resolve the issue by running our own, alternate versions of the words
~ "create", ":", ";", and ";asm" which use the label facility to compute the
~ addresses that will be needed at runtime. These alternates run instead of
~ the normal versions of these words. The code being compiled is responsible
~ for not doing anything else that would rely on "here" and "latest" matching
~ their runtime addresses, though it is otherwise allowed to modify and rely
~ on them in all the usual ways. The alternate versions are defined in this
~ file as their own words, "Lcreate", "L:", "L;", and "L;asm".
~
~ (-- done)
: transform-one
  word

  ~ If no word was returned, exit.
  dup 0 = { drop 0 exit } if

  ~ The string is on the top of the stack, so to get a pointer to it we get
  ~ the stack address.
  ~ (string)
  value@

  ~ If it's the magic word, end the transformation.
  dup s" pyrzqxgl" stringcmp 0 = { drop dropstring 1 exit } if

  transform-state transform-state-saved-latest @ swap find-in

  ~ Check whether the word was found in the dictionary.
  dup 0 != {
    ~ If the word is in the dictionary, check what mode we're in, then...
    dropstring-with-result
    ~ (entry pointer)
    interpreter-flags @ 0x01 & {
      ~ ... if we're in compile mode, there's still a chance it's an immediate
      ~ word, in which case we fall through to interpret mode...
      dup entry-flags@ 1 & 0 =

      ~ ... but it's a regular word, so append it to the heap.
      ~ TODO why is ; being treated as a regular word
      { entry-to-execution-token , 0 exit } if
    } if

    ~ ... if we're in interpret mode, or the word is immediate, run it.
    entry-to-execution-token execute 0 exit
  } if

  ~ If it's not in the dictionary, check whether it's a decimal number.
  drop
  ~ As before, we get the stack address and use it as a string pointer.
  ~ (string)
  value@ read-integer 0 = {
    ~ It's a number.
    interpreter-flags @ 0x01 & {
      ~ We're in compile mode; append first "lit", then the number, to the
      ~ heap. The version of "lit" we use is the one that's current when we
      ~ ourselves are compiled, hardcoded; doing a dynamic lookup would
      ~ require dealing with what happens if it's not found.
      ~ TODO this is wrong
      dropstring-with-result
      [ ' lit entry-to-execution-token literal ]
      , ,
      0 exit
    } if

    ~ We're in interpret mode; push the number to the stack. Or at least, that's
    ~ what the code we're interpreting will see. Really it's already on the
    ~ stack, just clean everything else up and leave it there.
    dropstring-with-result
    0 exit
  } if

  ~ If it's neither in the dictionary nor a number, just print an error.
  s" No such word: " emitstring value@ emitstring dropstring 0 ;


~ ." input " main-input-buffer dup .hex64 newline dup hexdump @ dup .hex64 newline bye
~ 1024 read-to-buffer
~ foo bar baz biff
~ pyrzqxgl
~ stackhex dup hexdump emitstring bye
~
~ : breakza
~   ." original" newline
~   main-input-buffer dup 6 8 * hexdump-from
~   dup push-input-buffer
~   ." updated original" newline
~   dup 6 8 * hexdump-from
~   ." copy" newline
~   dup input-buffer-next-source @ 6 8 * hexdump-from
~   newline
~   stackhex
~   dup pop-input-buffer
~   ." copied back" newline
~   6 8 * hexdump-from
~   stackhex
~   bye ;

~ (output point, input string pointer -- output point)
: transform
  main-input-buffer dup push-input-buffer
  ~ TODO the arguments for this seem to be backwards from the documentation
  swap attach-string-to-input-buffer

  ~   Save the old values of "here" and "latest", and set the initial values
  ~ of the internal ones. These values need to persist across iterations,
  ~ since client code will make its own updates to them and then rely on those
  ~ updates having taken effect. So we do the swap just once, here outside the
  ~ loop, and set it back when the loop ends.
  here @ transform-state transform-state-saved-here !
  latest @ transform-state transform-state-saved-latest !
  here !
  0 latest !
  ~ Now the stack has nothing of ours on it, so client code can do its thing.

  ~   It's important that the stack has nothing of ours on it that persists
  ~ across iterations, so that client code can add and remove stuff there as
  ~ it sees fit.
  { transform-one
    ~ (done)

    ~  When the loop is done, get the real values of "here" and "latest"
    ~ back. The internal "here" is also the output point, and will become our
    ~ return value. The internal "latest" is discarded.
    { here @
      transform-state transform-state-saved-here @ here !
      transform-state transform-state-saved-latest @ latest !
      ~ (output point)

      ~   Though we don't actually use transform-state outside of this
      ~ invocation, for tidiness we zero it out.
      0 transform-state transform-state-saved-here !
      0 transform-state transform-state-saved-latest !

      ~  Also put the input source back how it was.
      main-input-buffer pop-input-buffer

      exit } if } forever ;

~ 1024 allocate dup
~ ." compilation output buffer" newline dup hexdump
~ transform
~ : za ." ZA" 12 13 - . ;
~ : ' word value@ find dropstring-with-result
~   interpreter-flags @ 1 & { literal } if ; make-immediate
~ ~ ' za . newline
~ pyrzqxgl
~ ." back back back " here @ .hex64 newline
~ ~ ." stack after " stackhex
~ ~ 2dup swap hexdump-between
~ ~ : piz ." PIZ" ' za . newline ; piz
~ bye