~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ ~~ Bootstrapping the log ~~
~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
~
~   The log is the main region of memory within which most dynamic allocation
~ happens. It's a single contiguous segment of virtual memory, which is
~ requested from the kernel when Evocation starts up. Almost all of
~ Evocation's dynamic data is kept in the log, including the main dictionary;
~ several important global variables which make it possible to find and
~ allocate other data structures; and the control stack.
~
~   This file has the task of providing words which are useful for working
~ with the log, and more specifically which are useful for helping to bring
~ the log into existence. Once the log exists, it can be used to manage
~ itself, but there's a bootstrapping challenge in getting there. That
~ challenge is solved by the warm-start routine in execution.e, which relies
~ on the words in this file and should load after it.
~
~   Some modern Forths, including Jonesforth, refer to the log as the heap.
~ This is a misnomer; a heap is a data structure that allows non-contiguous
~ allocation. Although there are Forths that have true heaps, Evocation is not
~ one of them. Space in the log is allocated by incrementing the "here"
~ variable (one of those important globals), which necessarily can only
~ allocate contiguous blocks; there is no way to compact allocations to
~ reclaim fragmented, unused space in between them. Evocation does allow
~ deallocation using "forget", but this is done by resetting "here" and
~ "latest" to older values, unwinding every allocation that's been done since
~ the point in time they return to.
~
~   It would be a mistake to confuse this allocation strategy with the
~ more-general facilities for allocation, reallocation, and deallocation of
~ individual memory blocks that many other languages have. To avoid confusion,
~ we stay away from the name "heap", though it may still occasionally be used
~ colloquially because it's familiar from other Forths, and because most
~ programming languages have a heap as the main memory segment they request
~ from the kernel.
~
~   In the strictest technical sense, the log is a stack: Things are added
~ to the end of it, and removed from that same end. However, Evocation already
~ has two other stacks, the control and value stacks. Adding to the potential
~ confusion, the control stack is actually stored inside the log (as a
~ fixed-size chunk at the bottom). However, the log isn't really that much
~ like a stack when you look at how it's actually used. Unlike Evocation's
~ control and value stacks, data structures on the log tend to be rich and
~ complex, interlinked in various ways through the use of pointers. They also
~ tend to be long-lived, with the log tending to grow over time, whereas the
~ control and value stacks tend to remain roughly the same size through cycles
~ of growth and shrinking. In order to be able to speak precisely about what
~ we're doing, we introduce the name "log" to refer to the entire memory
~ segment and everything stored within it.
~
~   Another linguistic choice we make is to be clear about dictionaries. A
~ dictionary is a linked list of word entries. Each dictionary has a specific
~ handle, a pointer to a pointer, which is the root of the list. Each
~ word entry begins with a specific data structure, which among other things
~ includes a next-entry pointer, a flags byte, and a string that serves as
~ the entry's name. Older entries in a dictionary seldom change; newer entries
~ are added at the beginning of it, with their next-entry pointers leading to
~ the older entries. It is possible for several dictionaries to exist at once,
~ each with its own dictionary handle.
~
~   Since dictionaries are managed using pointers to individual entries, there
~ is no specific requirement about the order in which those entries occur in
~ memory or where they are allocated, but usually a new entry is allocated at
~ the end of the log, by incrementing the variable "here", in the same manner
~ as any other allocation. There is one particular dictionary, the main
~ dictionary, whose handle is the variable "latest". The main dictionary holds
~ every executable word that can be used normally via Evocation's interpreter.
~
~   Since the main dictionary is by far the most important thing in the log,
~ it can be tempting to conflate the log with the main dictionary. This is
~ accurate enough for some purposes, but note that other dictionaries are
~ often interleaved with it, their allocations entwining like grape vines even
~ while each remains separate, reachable only via its own root. See the
~ machine label facility, in labels.e, for an example of how a secondary
~ dictionary can be useful.
~
~   This may feel tangential, but it's important background and there's no
~ better place to explain it: A handle is a pointer to a pointer. The variable
~ "latest" returns a handle, a fixed address which always holds the pointer to
~ the root entry of the main dictionary. Dereferencing that handle gives you
~ the dictionary pointer, the address of the root entry, which is suitable to
~ pass to find-in and similar words that read the dictionary's contents. When
~ you want to add a new entry to a dictionary, you need the dictionary's
~ handle, so that the root pointer can be changed. When you only want to write
~ it, you only need the regular single pointer.
~
~   When reading the documentation of words that work with dictionaries, pay
~ close attention to whether their parameters include a dictionary handle, or
~ a dictionary pointer.
~
~   The term "handle" was widely known in the early days of microcomputing,
~ when memory-safe languages without direct pointer access were less common.
~ Today it is usually considered specific to systems programming, the type of
~ programming which lies beneath other software and deals with topics such as
~ memory management and processes. Evocation is a systems-programming
~ language, in the sense that it takes pains to not introduce mandatory
~ abstractions which would make it difficult or inefficient to work directly
~ with these topics. So, in understanding Evocation, it's important to know
~ about handles.


~   Find-in is the main word that provides the capability to look up words by
~ name, though it's usually used via "find" rather than being called directly.
~
~   Find-in traverses the linked list formed by a particular dictionary's
~ next-entry pointers, looking for an entry that matches a given name. The
~ dictionary pointer is the pointer (not handle) to the root of the list,
~ which runs from newest to oldest. For example, dereferencing the value of
~ "latest" gives the pointer to the main dictionary, which can be passed to
~ find-in.
~
~   Having find-in separated out is convenient when working with alternate
~ dictionaries, but the main reason for having it is not convenience but
~ necessity: During Evocation's startup, there is a period before global
~ variables are easily accessible, so there would be no way to implement
~ "find". The warm-start routine (see execution.e and transform.e) has the
~ job of fixing that, and it makes extensive use of find-in to do so.
~
~ (dictionary pointer, string pointer -- entry pointer or 0)
: find-in
  ~ It will be more convenient to have the entry pointer on top.
  swap

  {
    ~ If the entry pointer is null, exit.
    ~ (name pointer to find, current entry pointer)
    dup 0 = { swap drop exit } if

    ~ Check this entry's "hidden" flag.
    ~ (name pointer to find, current entry pointer)
    dup entry-flags@ 0x80 & 0x80 != {
      ~ Test whether this entry is a match.
      ~ (name pointer to find, current entry pointer)
      2dup 10 + stringcmp 0 = {
        ~ If we're here, it's a match. Clean up our working state and exit.
        ~ (name pointer to find, current entry pointer)
        swap drop exit
      } if
    } if

    ~ If we're here, it's not a match; traverse the pointer and repeat.
    ~ (name pointer to find, current entry pointer)
    @
  } forever ;


~   This has the same value as the constant control-stack-size, which is
~ defined in execution.e. Everything will break if it doesn't.
~
~ TODO: remove one of them. Probably the other one.
: log-offset                        0x10000 ; ~ 64 KiB

~ (log address -- log address, "log" pointer)
: log-load-log
  dup log-offset + ;
~ (log address -- log address, "s0" pointer)
: log-load-s0
  dup log-offset + 8 + ;
~ (log address -- log address, "r0" pointer)
: log-load-r0
  dup log-offset + 2 8 * + ;
~ (log address -- log address, "latest" pointer)
: log-load-latest
  dup log-offset + 3 8 * + ;
~ (log address -- log address, "here" pointer)
: log-load-here
  dup log-offset + 4 8 * + ;


~   This is a helper used by warm-start, which invokes find-in using "latest".
~ It relies on being passed the root address of the log, which is used to find
~ the global variable "latest". It's inconvenient to keep a log pointer around
~ all the time, which is why we stop doing it as soon as possible, but during
~ Evocation's startup there's no alternative. This word is used extensively
~ by code that's been compiled via the log-load transform; see transform.e for
~ details.
~
~   It would be possible to unload this word after the log is created, but
~ there are rare situations in which it's still useful, such as injecting
~ Evocation into another process's address space. Plus, it's small. So, we
~ keep it around.
~
~ (log address, string pointer -- log address, entry pointer or 0)
: log-load-find
  swap log-load-latest @ swap 3unroll swap find-in ;

~   In the code generated by the log-load transform, it's convenient to have
~ only a single step needed to look up a word's execution token. This helper
~ does log-load-find, then gets the execution token if an entry is found.
~
~ (log address, string pointer -- log address, execution token or 0)
: log-load-find-execution-token
  log-load-find dup { entry-to-execution-token } if ;


~   This is the same as "create", from interpret.e, except that it takes the
~ log's address as a parameter rather than hardcoding it, so that it can be
~ used in situations where the normal compilation process isn't yet available.
~
~   The requisite stack juggling is kind of finicky, sorry if it's hard to
~ read, but it's doing the same steps in the same order as the regular
~ "create".
~
~ (log address, string pointer -- log address)
: log-load-create
  dup stringlen 1 + dup 3unroll
  ~ (log address, name field length, string pointer, name field length)

  3 pick log-load-here swap drop @ 10 + 3unroll memmove
  ~ (log address, name field length)

  over log-load-here swap drop @
  ~ (log address, name field length, output point)

  2 pick log-load-latest swap drop @ pack64
  ~ (log address, name field length, output point)
  0 pack8
  0 pack8
  +
  ~ (log address, output point)
  8 packalign
  ~ (log address, output point)

  over log-load-here swap drop @
  ~ (log address, output point, old here value)
  2 pick log-load-latest swap drop !
  ~ (log address, output point)
  over log-load-here swap drop ! ;


~   This is the same as ",", from interpret.e, except that it takes the log's
~ address as a parameter rather than hardcoding it, so that it can be used in
~ situations where the normal compilation process isn't yet available.
~
~   Again, the stack juggling is kind of a lot, sorry about that.
~
~ (log address, value -- log address)
: log-load-comma
  swap log-load-here swap 3unroll
  ~ (log address, value, here)
  @ swap pack64
  ~ (log address, updated here value)
  3roll log-load-here swap 3unroll
  ~ (log address, updated here value, here)
  ! ;


~   This is the same as "variable", from interpret.e, except that it takes the
~ log's address as a parameter rather than hardcoding it, so that it can be
~ used in situations where the normal compilation process isn't yet available.
~
~ (log address, address for new variable word, string pointer -- log address)
: log-load-variable
  3roll swap log-load-create
  ~ (address for new variable word, log address)

  log-load-here 3unroll
  ~ (log address, address for new variable word, here)

  dup @
  ~ (log address, address for new variable word, here, output point)
  dup 8 + pack64

  3roll
  :rax
  mov-reg64-imm64
  ~ (log address, here, output point)

~   :rax push-reg64
  pack-next
  8 packalign

  swap ! ;