~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ ~~ Bootstrapping the log ~~
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
~   The log is the main region of memory within which most dynamic allocation
~ happens. It's a single contiguous segment of virtual memory, which is
~ requested from the kernel when Evocation starts up. Almost all of
~ Evocation's dynamic data is kept in the log, including the main dictionary;
~ several important global variables which make it possible to find and
~ allocate other data structures; and the control stack.
~
~   This file has the task of providing words which are useful for working
~ with the log, and more specifically which are useful for helping to bring
~ the log into existence. Once the log exists, it can be used to manage
~ itself, but there's a bootstrapping challenge in getting there. That
~ challenge is solved by the warm-start routine in execution.e, which relies
~ on the words in this file and should load after it.
~
~   Some modern Forths, including Jonesforth, refer to the log as the heap.
~ This is a misnomer; a heap is a data structure that allows non-contiguous
~ allocation. Although there are Forths that have true heaps, Evocation is not
~ one of them. Space in the log is allocated by incrementing the "here"
~ variable (one of those important globals), which necessarily can only
~ allocate contiguous blocks; there is no way to compact allocations to
~ reclaim fragmented, unused space in between them. Evocation does allow
~ deallocation using "forget", but this is done by resetting "here" and
~ "latest" to older values, unwinding every allocation that's been done since
~ the point in time they return to.
~
~   It would be a mistake to confuse this allocation strategy with the
~ more-general facilities for allocation, reallocation, and deallocation of
~ individual memory blocks that many other languages have. To avoid confusion,
~ we stay away from the name "heap", though it may still occasionally be used
~ colloquially because it's familiar from other Forths, and because most
~ programming languages have a heap as the main memory segment they request
~ from the kernel.
~
~   In the strictest technical sense, the log is a stack: Things are added
~ to the end of it, and removed from that same end. However, Evocation already
~ has two other stacks, the control and value stacks. Adding to the potential
~ confusion, the control stack is actually stored inside the log (as a
~ fixed-size chunk at the bottom). However, the log isn't really that much
~ like a stack when you look at how it's actually used. Unlike Evocation's
~ control and value stacks, data structures on the log tend to be rich and
~ complex, interlinked in various ways through the use of pointers. They also
~ tend to be long-lived, with the log tending to grow over time, whereas the
~ control and value stacks tend to remain roughly the same size through cycles
~ of growth and shrinking. In order to be able to speak precisely about what
~ we're doing, we introduce the name "log" to refer to the entire memory
~ segment and everything stored within it.
~
~   Another linguistic choice we make is to be clear about dictionaries. A
~ dictionary is a linked list of word entries. Each dictionary has a specific
~ handle, a pointer to a pointer, which is the root of the list. Each
~ word entry begins with a specific data structure, which among other things
~ includes a next-entry pointer, a flags byte, and a string that serves as
~ the entry's name. Older entries in a dictionary seldom change; newer entries
~ are added at the beginning of it, with their next-entry pointers leading to
~ the older entries. It is possible for several dictionaries to exist at once,
~ each with its own dictionary handle.
~
~   Since dictionaries are managed using pointers to individual entries, there
~ is no specific requirement about the order in which those entries occur in
~ memory or where they are allocated, but usually a new entry is allocated at
~ the end of the log, by incrementing the variable "here", in the same manner
~ as any other allocation. There is one particular dictionary, the main
~ dictionary, whose handle is the variable "latest". The main dictionary holds
~ every executable word that can be used normally via Evocation's interpreter.
~
~   Since the main dictionary is by far the most important thing in the log,
~ it can be tempting to conflate the log with the main dictionary. This is
~ accurate enough for some purposes, but note that other dictionaries are
~ often interleaved with it, their allocations entwining like grape vines even
~ while each remains separate, reachable only via its own root. See the
~ machine label facility, in labels.e, for an example of how a secondary
~ dictionary can be useful.
~
~   This may feel tangential, but it's important background and there's no
~ better place to explain it: A handle is a pointer to a pointer. The variable
~ "latest" returns a handle, a fixed address which always holds the pointer to
~ the root entry of the main dictionary. Dereferencing that handle gives you
~ the dictionary pointer, the address of the root entry, which is suitable to
~ pass to find-in and similar words that read the dictionary's contents. When
~ you want to add a new entry to a dictionary, you need the dictionary's
~ handle, so that the root pointer can be changed. When you only want to write
~ it, you only need the regular single pointer.
~
~   When reading the documentation of words that work with dictionaries, pay
~ close attention to whether their parameters include a dictionary handle, or
~ a dictionary pointer.
~
~   The term "handle" was widely known in the early days of microcomputing,
~ when memory-safe languages without direct pointer access were less common.
~ Today it is usually considered specific to systems programming, the type of
~ programming which lies beneath other software and deals with topics such as
~ memory management and processes. Evocation is a systems-programming
~ language, in the sense that it takes pains to not introduce mandatory
~ abstractions which would make it difficult or inefficient to work directly
~ with these topics. So, in understanding Evocation, it's important to know
~ about handles.
~
~   Some of these bootstrap words rely on being able to invoke assembler words
~ that output machine code. Therefore, those words must be available at
~ runtime. Since nothing can be dynamically available at runtime until after
~ we've already run the log-load routine, which relies on the stuff in this
~ file, the assembler words must be statically available via the label
~ transform. That means their definitions in arm64.e must be loaded before
~ this file.


~   This has the same value as the constant control-stack-size, which is
~ defined in execution.e. Everything will break if it doesn't.
~
~ TODO: remove one of them. Probably the other one.
: log-offset                        0x10000 ; ~ 64 KiB

~ (log address -- log address, "log" pointer)
: log-load-log
  dup log-offset + ;
~ (log address -- log address, "s0" pointer)
: log-load-s0
  dup log-offset + 8 + ;
~ (log address -- log address, "r0" pointer)
: log-load-r0
  dup log-offset + 2 8 * + ;
~ (log address -- log address, "latest" pointer)
: log-load-latest
  dup log-offset + 3 8 * + ;
~ (log address -- log address, "here" pointer)
: log-load-here
  dup log-offset + 4 8 * + ;


~   This is a helper used by warm-start, which invokes find-in using "latest".
~ It relies on being passed the root address of the log, which is used to find
~ the global variable "latest". It's inconvenient to keep a log pointer around
~ all the time, which is why we stop doing it as soon as possible, but during
~ Evocation's startup there's no alternative. This word is used extensively
~ by code that's been compiled via the log-load transform; see transform.e for
~ details.
~
~   It would be possible to unload this word after the log is created, but
~ there are rare situations in which it's still useful, such as injecting
~ Evocation into another process's address space. Plus, it's small. So, we
~ keep it around.
~
~ (log address, string pointer -- log address, entry pointer or 0)
: log-load-find
  swap log-load-latest @ swap 3unroll swap find-in ;

~   In the code generated by the log-load transform, it's convenient to have
~ only a single step needed to look up a word's execution token. This helper
~ does log-load-find, then gets the execution token if an entry is found.
~
~ (log address, string pointer -- log address, execution token or 0)
: log-load-find-execution-token
  dup 3unroll log-load-find dup
  {
    3roll drop
    entry-to-execution-token
  } {
    drop swap
    ." No such word: " emitstring newline
    0
  } if-else ;


~   This is the same as "create", from dynamic.e, except that it takes the
~ log's address as a parameter rather than hardcoding it, so that it can be
~ used in situations where the normal compilation process isn't yet available.
~
~   The requisite stack juggling is kind of finicky, sorry if it's hard to
~ read, but it's doing the same steps in the same order as the regular
~ "create".
~
~ (log address, string pointer -- log address)
: log-load-create
  dup stringlen 1 + dup 3unroll
  ~ (log address, name field length, string pointer, name field length)

  3 pick log-load-here swap drop @ 10 + 3unroll memmove
  ~ (log address, name field length)

  over log-load-here swap drop @
  ~ (log address, name field length, output point)

  2 pick log-load-latest swap drop @ pack64
  ~ (log address, name field length, output point)
  0 pack8
  0 pack8
  +
  ~ (log address, output point)
  8 packalign
  ~ (log address, output point)

  over log-load-here swap drop @
  ~ (log address, output point, old here value)
  2 pick log-load-latest swap drop !
  ~ (log address, output point)
  over log-load-here swap drop ! ;


~   This is the same as ",", from dynamic.e, except that it takes the log's
~ address as a parameter rather than hardcoding it, so that it can be used in
~ situations where the normal compilation process isn't yet available.
~
~   Again, the stack juggling is kind of a lot, sorry about that.
~
~ (log address, value -- log address)
: log-load-comma
  swap log-load-here swap 3unroll
  ~ (log address, value, here)
  @ swap pack64
  ~ (log address, updated here value)
  swap log-load-here swap 3unroll
  ~ (log address, updated here value, here)
  ! ;


~   This is the same as `;asm`, from dynamic.e, except that it takes the
~ log's address as a parameter rather than hardcoding it, so that it can be
~ used in situations where the normal compilation process isn't yet available.
~
~   Its two main responsibilities are to call `pack-next`, from
~ execution-support.e, and to overwrite the codeword. It also deals with
~ alignment.
~ (log address)
: log-load-semicolon-assembly
  log-load-here @
  ~ (log address, output point)

  pack-next
  8 packalign
  ~ (log address, output point)

  swap log-load-here swap 3unroll !
  ~ (log address)

  log-load-latest @
  ~ (log address, entry pointer)
  entry-to-execution-token
  dup 8 + swap ! ;


~   This is the same as "variable", from dynamic.e, except that it takes the
~ log's address as a parameter rather than hardcoding it, so that it can be
~ used in situations where the normal compilation process isn't yet available.
~
~ (log address, address for new variable word, string pointer -- log address)
: log-load-variable
  3roll swap log-load-create
  ~ (address for new variable word, log address)

  log-load-here swap 3unroll
  ~ (log address, address for new variable word, here)

  dup @
  ~ (log address, address for new variable word, here, output point)
  dup 8 + pack64

  3roll :rax mov-reg64-imm64
  ~ (log address, here, output point)

  :rax push-reg64
  pack-next
  8 packalign

  swap ! ;


~   A keyword is a word that evaluates to its own address, which makes it
~ suitable for use as a constant. See more detail on that in dynamic.e,
~ where "keyword" is defined.
~
~   Unlike Common Lisp, the lexer doesn't create keywords for us, we have to
~ do it explicitly. If if that were to someday change, the log-load routine
~ would still need a way to do it, which is this.
~
~   It's kind of a pain to look up the appropriate "docol" from here, so we
~ do it in assembler instead.
~
~ (log address, string pointer -- log address)
: log-load-keyword
  log-load-create
  ~ (log address)

  log-load-here @ dup
  ~ (log address, self execution token, output point)

  dup 8 + pack64
  ~ (log address, self execution token, output point)

  swap :rax mov-reg64-imm64
  ~ (log address, output point)

  :rax push-reg64
  pack-next
  8 packalign
  ~ (log address, output point)

  swap log-load-here
  ~ (output point, log address, here)
  swap 3unroll
  ~ (log address, output point, here)
  ! ;


~   This is a helper used by log-load-string-alternate. It does the usual
~ string packing thing, but at one layer of indirection more than usual. Its
~ responsibility includes alignment, unlike packstring.
~
~ (log address, string pointer -- log address)
: log-load-comma-string
  swap log-load-here @ 3roll
  ~ (log address, output point, string pointer)

  packstring
  8 packalign
  ~ (log address, output point)

  swap log-load-here 3roll swap
  ~ (log address, output point, here)
  ! ;


~   Now we have a bunch of words that are the back-ends for the log-load
~ transform's high-level flow control alternates. These implementations
~ closely parallel the non-transformed versions in flow-control.e, which
~ should be referenced in understanding them.
~
~   These variants are a bit unusual in their interfaces: They end with the
~ log address at the top of the stack, even when they have values to return.
~ That's because they're really just "talking" to each other; they don't need
~ to interact with anything else, and doing it this way saves the alternates
~ the work of swapping things around after.
~
~   Notice also that, because these run entirely at log-load time, they are
~ always dealing with target pointers and don't have to convert address
~ spaces.
~
~ (log address -- start pointer, log address)
: log-load-left-curly-brace
  log-load-here @ swap ;


~ (start pointer, log address -- start pointer, length, log address)
: log-load-right-curly-brace
  swap dup 3roll
  ~ (start pointer, start pointer, log address)
  log-load-here @ swap
  ~ (start pointer, start pointer, end pointer, log address)
  3unroll swap - swap ;


~ (start, length, log address -- log address)
: log-load-if
  3unroll
  ~ (log address, start, length)
  2dup swap dup
  ~ (log address, start, length, length, start, start)
  5 8 * +
  ~ (log address, start, length, length, start, adjusted start)
  3unroll swap
  ~ (log address, start, length, adjusted start, start, length)
  memmove
  ~ (log address, start, length)
  swap 3roll log-load-here dup @
  ~ (length, start, log address, here pointer, old here)
  swap 4 roll swap
  ~ (length, log address, old here, start, here pointer)
  !
  ~ (length, log address, old here)
  3unroll
  ~ (old here, length, log address)
  s" lit" log-load-find entry-to-execution-token log-load-comma
  0 log-load-comma
  s" !=" log-load-find entry-to-execution-token log-load-comma
  s" 0branch" log-load-find entry-to-execution-token log-load-comma
  ~ (old here, length, log address)
  swap dup 3unroll
  ~ (old here, length, log address, length)
  8 + log-load-comma
  ~ (old here, length, log-address)
  3unroll
  ~ (log address, old here, length)
  drop 5 8 * +
  ~ (log address, new here)
  swap log-load-here
  ~ (new here, log address, here pointer)
  3roll swap ! ;


~ (start, length, log address -- log address)
: log-load-unless
  3unroll
  ~ (log address, start, length)
  2dup swap dup
  ~ (log address, start, length, length, start, start)
  5 8 * +
  ~ (log address, start, length, length, start, adjusted start)
  3unroll swap
  ~ (log address, start, length, adjusted start, start, length)
  memmove
  ~ (log address, start, length)
  swap 3roll log-load-here dup @
  ~ (length, start, log address, here pointer, old here)
  swap 4 roll swap
  ~ (length, log address, old here, start, here pointer)
  !
  ~ (length, log address, old here)
  3unroll
  ~ (old here, length, log address)
  s" lit" log-load-find entry-to-execution-token log-load-comma
  0 log-load-comma
  s" =" log-load-find entry-to-execution-token log-load-comma
  s" 0branch" log-load-find entry-to-execution-token log-load-comma
  ~ (old here, length, log address)
  swap dup 3unroll
  ~ (old here, length, log address, length)
  8 + log-load-comma
  ~ (old here, length, log-address)
  3unroll
  ~ (log address, old here, length)
  drop 5 8 * +
  ~ (log address, new here)
  swap log-load-here
  ~ (new here, log address, here pointer)
  3roll swap ! ;


~ (true start, true length, false start, false length, log address
~  -- log address)
: log-load-if-else
  5 unroll 2dup
  ~ (log address, true start, true length, false start, false length,
  ~  false start, false length)
  swap dup 7 8 * + swap 3roll
  ~ (log address, true start, true length, false start, false length,
  ~  adjusted false start, false start, false length)
  memmove
  ~ (log address, true start, true length, false start, false length)
  4 roll dup 5 unroll
  ~ (log address, true start, true length, false start, false length,
  ~  true start)
  4 roll dup 5 unroll
  ~ (log address, true start, true length, false start, false length,
  ~  true start, true length)
  swap dup 5 8 * +
  ~ (log address, true start, true length, false start, false length,
  ~  true length, true start, adjusted true start)
  swap 3roll
  ~ (log address, true start, true length, false start, false length,
  ~  adjusted true start, true start, true length)
  memmove
  ~ (log address, true start, true length, false start, false length)

  4 roll dup 5 unroll
  ~ (log address, true start, true length, false start, false length,
  ~  true start)
  6 roll log-load-here @ 7 unroll
  ~ (old here, true start, true length, false start, false length, true start,
  ~  log address)
  log-load-here 3roll swap !
  ~ (old here, true start, true length, false start, false length,
  ~  log address)
  s" lit" log-load-find entry-to-execution-token log-load-comma
  0 log-load-comma
  s" !=" log-load-find entry-to-execution-token log-load-comma
  s" 0branch" log-load-find entry-to-execution-token log-load-comma
  ~ (old here, true start, true length, false start, false length,
  ~  log address)
  4 roll dup 5 unroll
  ~ (old here, true start, true length, false start, false length,
  ~  log address, true length)
  3 8 * + log-load-comma
  ~ (old here, true start, true length, false start, false length,
  ~  log address)
  3unroll
  ~ (old here, true start, true length, log address,
  ~  false start, false length)
  swap dup 3unroll
  ~ (old here, true start, true length, log address,
  ~  false start, false length, false start)
  5 8 * +
  ~ (old here, true start, true length, log address,
  ~  false start, false length, adjusted false start)
  4 roll log-load-here 3roll swap !
  ~ (old here, true start, true length,
  ~  false start, false length, log address)
  s" branch" log-load-find entry-to-execution-token log-load-comma
  swap 8 + log-load-comma
  ~ (old here, true start, true length, false start, log address)
  4 unroll
  ~ (old here, log address, true start, true length, false start)
  drop drop drop
  ~ (old here, log address)
  log-load-here
  3roll 7 8 * + swap ! ;


~ (start, length, log address -- log address)
: log-load-forever
  s" branch" log-load-find entry-to-execution-token log-load-comma
  swap 8 + -1 * log-load-comma
  swap drop ;


~ (test start, test length, body start, body length, log address
~  -- log address)
: log-load-while
  5 unroll 2dup
  ~ (log address, test start, test length, body start, body length,
  ~  body start, body length)
  swap dup 5 8 * + swap 3roll
  ~ (log address, test start, test length, body start, body length,
  ~  adjusted body start, body start, body length)
  memmove

  ~ (log address, test start, test length, body start, body length)
  5 roll log-load-here @ 6 unroll
  ~ (old here, test start, test length, body start, body length, log address)
  log-load-here 4 roll dup 5 unroll swap !
  ~ (old here, test start, test length, body start, body length, log address)
  s" lit" log-load-find entry-to-execution-token log-load-comma
  0 log-load-comma
  s" !=" log-load-find entry-to-execution-token log-load-comma
  s" 0branch" log-load-find entry-to-execution-token log-load-comma
  swap dup 3unroll 3 8 * + log-load-comma
  ~ (old here, test start, test length, body start, body length, log address)
  log-load-here 5 8 * 8 roll + swap !
  ~ (test start, test length, body start, body length, log address)
  s" branch" log-load-find entry-to-execution-token log-load-comma
  5 unroll
  ~ (log address, test start, test length, body start, body length)
  6 8 * + swap drop + swap drop -1 * log-load-comma ;