implement log-load-create; add all the pack/unpack stuff to core.e

Force-Push: yes Change-Id: I04dd65a9eec71f9b50c8875bdcbe5d4be59888d5
author: Irene Knapp <ireneista@irenes.space> 2026-05-17 17:29:09 -0700
committer: Irene Knapp <ireneista@irenes.space> 2026-05-17 17:29:09 -0700
commit: 3b41dbfa2338c11dd8398026c00922f20f32dc81 (patch)
tree: 1761f2b8f2da972b830cf617d194c7c5be5b1535 /core.e
parent: 4f1a07da9c87a1560da34b8a96a9de4cdc90f1fc (diff)
1 files changed, 141 insertions, 0 deletions
diff --git a/core.e b/core.e
index 9393a04..b2062d1 100644
--- a/core.e
+++ b/core.e
@@ -1,3 +1,15 @@
+~ ~~~~~~~~~~~~~~~~~~~~~~~~~
+~ ~~ Core Forth features ~~
+~ ~~~~~~~~~~~~~~~~~~~~~~~~~
+~
+~   This file provides extremely fundamental functionality which is a
+~ necessary component of any Forth dialect, including Evocation. It is
+~ included statically as part of any generated executable, and a second copy
+~ of it is later copied into the log when that executable runs. Therefore, it
+~ is written to obey the constraints of both the label transform, and the
+~ log-load transform; see transform.e for more details on that.
+~
+~
 ~ Stack manipulation routines
 ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ~
@@ -955,3 +967,132 @@
     :rax jmp-abs-indirect-reg64
     here ! ] ;asm
 
+
+~ Dictionary entries
+~ ~~~~~~~~~~~~~~~~~~
+~
+~   Now, we have a bunch of words that are used for traversing the Forth
+~ core data structures that describe words. First, we have a couple that
+~ relate to individual words and their pieces...
+~
+~   The log-load transform produces code that requires
+~ entry-to-execution-token, which means it's needed statically. So this stuff
+~ to deal with word entry headers might as well go in core, since it has no
+~ dependencies to speak of.
+~
+~   These are the first words in core that are implemented in Forth rather
+~ than assembler. That's not as big a deal as it may seem; the Forth execution
+~ model has been ready-to-go ever since we implemented docol and exit, and
+~ at this point we have enough basics to do useful things with it.
+
+~   Jonesforth calls this "TFCA" and ">CFA"; its author speculates that the
+~ original meaning is "code field address".
+~
+~ (entry pointer -- execution token)
+: entry-to-execution-token
+  ~ Skip next-entry pointer, flag byte, and start terminator.
+  10 +
+  ~ Skip string contents.
+  dup stringlen +
+  ~ Skip one for the null terminator, seven more for alignment.
+  8 +
+  ~ Zero the low bits and now it's aligned.
+  7 invert & ;
+
+~   Jonesforth calls this "CFA>". Jonesforth's implementation searches the
+~ entire dictionary, since its word header format isn't designed to be
+~ traversed in reverse, but ours is, so it should be fast.
+~
+~ (execution token -- entry pointer)
+: execution-token-to-entry
+  1 -
+  dup reverse-padding-len -
+  dup reverse-stringlen -
+  9 - ;
+
+~ (entry pointer -- flags byte)
+: entry-flags@
+  8 + @ 0xFF & ;
+
+~ TODO these parameters are in a counterintuitive order, swap them
+~ (entry pointer, new flags byte --)
+: entry-flags!
+  swap
+  8 +
+  dup @ 3roll
+  0xFF &
+  swap 0xFFFFFFFFFFFFFF00 & |
+  swap !
+  ;
+
+~ (entry pointer -- name string pointer)
+: entry-to-name 10 + ;
+
+
+~ Binary packing
+~ ~~~~~~~~~~~~~~
+~
+~   These routines are for building up data structures in-memory. Sometimes
+~ they're used for structures that are meant to stay in memory; other times
+~ it's a buffer that will become output.
+~
+~   The general pattern is that each routine takes an output address and
+~ some specific datum, and returns the output address adjusted to point
+~ after the new datum. That makes them easy to chain together. We call this
+~ address the "output point", to capture the idea that it's a running total
+~ which gets updated by each new datum as it's packed.
+
+~ (output point, value -- output point)
+: pack64 swap dup 3unroll ! 8 + ;
+: pack32 swap dup 3unroll 32! 4 + ;
+: pack16 swap dup 3unroll 16! 2 + ;
+: pack8 swap dup 3unroll 8! 1 + ;
+
+~   This works on C-style strings, which are characters followed by a null
+~ terminator. The packed data includes the null terminator.
+~
+~ (output point, string pointer -- output point)
+: packstring
+  dup stringlen 1 + dup
+  ~ (output point, source, length, length)
+  4 roll dup 5 unroll
+  ~ (destination, source, length, length, output point)
+  + 4 unroll
+  ~ (output point, destination, source, length)
+  memcopy ;
+
+~ (output point, alignment byte count -- output point)
+: packalign
+  { 2dup /% drop { drop exit } unless
+    swap 0 pack8 swap } forever ;
+
+
+~ Binary unpacking
+~ ~~~~~~~~~~~~~~~~
+~
+~   These routines are for examining data structures in-memory.
+~
+~   Similarly to the output routines, each routine takes an input address,
+~ which it updates to point after the data item being read. We call this the
+~ "input point". Since this is input, the routines return data items rather
+~ than accepting them.
+
+~ (input point -- input point, value)
+: unpack64 dup @ swap 8 + swap ;
+: unpack32 dup 32@ swap 4 + swap ;
+: unpack16 dup 16@ swap 2 + swap ;
+: unpack8 dup 8@ swap 1 + swap ;
+
+~ TODO does this need to have a separate name?
+~ (proposed size, alignment byte count -- adjusted size)
+: align-size
+  dup 3unroll dup 3unroll
+  ~ (alignment, alignment, proposed size, alignment)
+  1 - + swap /% swap drop * ;
+
+~   You might think this would be identical to packalign, but packalign has
+~ side effects.
+~
+~ (input point, alignment byte count -- input point)
+: unpackalign align-size ;
+
author	Irene Knapp <ireneista@irenes.space>	2026-05-17 17:29:09 -0700
committer	Irene Knapp <ireneista@irenes.space>	2026-05-17 17:29:09 -0700
commit	3b41dbfa2338c11dd8398026c00922f20f32dc81 (patch)
tree	1761f2b8f2da972b830cf617d194c7c5be5b1535 /core.e
parent	4f1a07da9c87a1560da34b8a96a9de4cdc90f1fc (diff)