diff options
Diffstat (limited to 'interpret.e')
| -rw-r--r-- | interpret.e | 297 |
1 files changed, 20 insertions, 277 deletions
diff --git a/interpret.e b/interpret.e index 7db86b7..d2bf10f 100644 --- a/interpret.e +++ b/interpret.e @@ -38,291 +38,37 @@ ~ It is primarily concerned with managing the contents of an area of memory ~ we call the "log"; see log-load.e for more detail on terminology. -~ TODO find a better place for this -: describe-compilation - ~ It's always in progress ;) We just need a header like this so it doesn't - ~ get confused with other kinds of debug output. - ." compilation in progress" newline - latest @ hexdump - newline - ." here " here @ .hex64 newline - ." latest " latest @ .hex64 newline - ." name of latest: " latest @ entry-to-name emitstring newline - newline ; - - -~ Allocate space by incrementing "here", and output a word entry header in -~ it. Also add it to the "latest" linked list. Use zero as the flag values; -~ accept a string pointer on the stack and use its contents as the name. -~ -~ This is the first step of creating a new word. Its responsibility includes -~ everything up to the codeword, not including the codeword; it leaves things -~ all set up to start appending contents to the new word by calling ",". -~ -~ There's a handy diagram of the entry header format under "quick -~ reference", in the description of the exeuction model in exeuction.e. Create -~ is responsible for everything up to the codeword, not including it. -~ -~ When a word is created in interpret mode using s" to provide a string -~ literal, the temporary space that s" uses is in the same place as the -~ entry header we're going to write out. It really is very useful to have -~ that work. Fortunately, it does! We're able to avoid needing a special case -~ by doing things in a very careful way, as described below. -~ -~ (string pointer --) -: create - ~ We add one to the string length in order to include the trailing null - ~ terminator. This will be the length of our name field; we save an extra - ~ copy of it to help with packing later. - dup stringlen 1 + dup 3unroll - ~ (name field length, string pointer, name field length) - - ~ We use memmove to put the string in its final position, because it works - ~ correctly when the destination overlaps with the source. Notice that we - ~ do this before writing anything else in the entry header, to avoid - ~ stepping on it. The name string always starts ten bytes into the header, - ~ so we can use a fixed offset. - here @ 10 + 3unroll memmove - ~ (name field length) - - ~ Now we can get back to the fields that belong at the start of the entry - ~ header. We take the value of "here" and keep a working copy of it on the - ~ stack, which we'll advance every time we write more bytes. - here @ - ~ (name field length, updated "here" pointer) - - ~ Pack the old value of "latest" as the first field of the header, linking - ~ from the newly-defined word to the next-newest word. - ~ - ~ All the entries form a linked list, from newest to oldest. Since the - ~ link is the first field in the entry header, you can get from each entry - ~ to the one before it just by dereferencing the entry pointer. - latest @ pack64 - - ~ This is the flags byte. It starts at zero; our caller can change it if - ~ desired. - 0 pack8 - - ~ This is the "other" null terminator, used when traversing the name - ~ string backwards for execution-token-to-entry. Yes, the name is - ~ null-terminated at both ends. - 0 pack8 - - + ~ The name field is already populated, so just skip past it. - ~ (updated "here" pointer) - - ~ The codeword is aligned to a machine-word boundary, and the padding for - ~ it is create's responsibility. - ~ - ~ By adding the null terminator before adding alignment padding, we've - ~ made sure there's always at least one null byte. Otherwise we'd be missing - ~ the terminator if by chance the name were exactly the wrong length. - 8 packalign - ~ (updated "here" pointer) - - ~ Retrieve the value of "here", which still doesn't reflect our additions, - ~ and store it at the adddress of "latest". It's the start of our - ~ newly-defined word, which makes it the latest word. - here @ latest ! - - ~ Finally, we write our updated value of "here" back into the variable. - here ! ; - -~ (value to append to current word-in-progress --) -: , here @ swap pack64 here ! ; - - -: self-codeword here @ 8 + , ; - - -~ A variable is simply a word that returns a specific address, always the -~ same one, at which a value can be stored. This word "variable" takes and -~ address and a word name, and defines the word. Allocating space is its -~ caller's responsibility. -~ -~ TODO the address is constant but the contents vary, confusing, write it up -~ -~ (address for new variable word to point to, string pointer --) -: variable - create - self-codeword - here @ - swap :rax mov-reg64-imm64 - :rax push-reg64 - pack-next - 8 packalign - here ! ; - - -~ A keyword is a word that evaluates to its own address, which makes it -~ suitable for use as a constant. By convention, all our keywords have names -~ starting with a colon, which imitates the way they work in Common Lisp. -~ -~ Specifically, it returns its own execution token. Thus, executing its -~ result repeatedly will keep giving the same value. We aren't in the habit of -~ doing quote-exec kinds of things in Evocation, but it seems as good as any -~ other unique value, so we might as well. -~ -~ Unlike CL, we don't currently have the lexer automatically create keywords -~ for us; we create them explicitly. That's likely to be added at some point, -~ but at the moment the feature is lying fallow to see whether it winds up -~ seeing a lot of use. -~ -~ (string pointer --) -: keyword - create - - ~ Before outputting our codeword, save a copy of the address where it's - ~ going to be. That will be the execution token we return. - here @ dup - ~ (self execution token, output point) - - ~ Now add a codeword. This is an assembly word, so it's a self-codeword, - ~ meaning it points to the word right after itself. - dup 8 + pack64 - ~ (self execution token, output point) - - ~ Now we consume the execution token, using it as part of this instruction. - :rax mov-reg64-imm64 - ~ (output point) - - ~ To return it, we push it to the stack. - :rax push-reg64 - - ~ Now just the normal stuff every assembly word ends with. - pack-next - 8 packalign - - here ! ; - - -~ Allocates bytes on the heap by incrementing the global "here" pointer. The -~ "here" pointer is kept aligned to an 8-byte boundary, regardless of the size -~ requested. -~ -~ This does not create dictionary entries, it's just a raw memory interface. -~ It's suitable for allocating data or scratch space. -: allocate - here @ dup - ~ (size, here value, here value) - 3roll + 8 packalign here ! ; - : hide-entry dup entry-flags@ 0x80 | entry-flags! ; : unhide-entry dup entry-flags@ 0x80 invert & entry-flags! ; -~ (pointer to buffer metadata -- pointer to buffer "physical-start" field) -: buffer-physical-start ; - ~ The physical-start field happens to be the first thing in the metadata, so - ~ this is an nop, but it still exists as a word because having it reduces - ~ confusion. -~ (pointer to buffer metadata -- pointer to buffer "physical-length" field) -: buffer-physical-length 8 + ; -~ (pointer to buffer metadata -- pointer to buffer "logical-start" field) -: buffer-logical-start 2 8 * + ; -~ (pointer to buffer metadata -- pointer to buffer "logical-length" field) -: buffer-logical-length 3 8 * + ; -~ (pointer to input buffer metadata -- pointer to input buffer "refill" field) -: input-buffer-refill 4 8 * + ; -~ (pointer to input buffer metadata -~ -- pointer to input buffer "next-source" field) -: input-buffer-next-source 5 8 * + ; - -~ Given an initialized buffer (input or otherwise), sets its logical-start -~ and logical-length fields to indicate the buffer is empty. This relies on -~ the buffer having a backing store attached, but does not alter the backing -~ store or its contents. -~ -~ (pointer to buffer metadata --) -: clear-buffer - dup buffer-physical-start @ swap - ~ (address of backing store, metadata pointer) - dup 3unroll - ~ (metadata pointer, address of backing store, metadata pointer) - buffer-logical-start ! - buffer-logical-length 0 swap ! ; - - -~ Sets all fields in an input buffer metadata structure to zero, -~ effectively detaching and leaking any backing store that had been attached -~ to it. Suitable for use during initialization. -~ -~ (pointer to input buffer metadata --) -: zero-input-buffer-metadata - dup buffer-physical-start 0 swap ! - dup buffer-physical-length 0 swap ! - dup buffer-logical-start 0 swap ! - dup buffer-logical-length 0 swap ! - dup input-buffer-refill 0 swap ! - ~ Notice the absence of a dup this time. - input-buffer-next-source 0 swap ! ; - - -~ Allocates input-buffer metadata, with no backing store attached. -~ Initializes the metadata to all zeroes. -~ -~ (-- pointer to input buffer metadata) -: allocate-input-buffer-metadata - 6 8 * allocate - dup zero-input-buffer-metadata ; - - -~ Allocates input buffer metadata and a backing store, in one operation. -~ Points the metadata to the backing store. -~ -~ (buffer capacity in bytes -- pointer to input buffer metadata) -: allocate-input-buffer - dup 6 8 * + allocate - dup zero-input-buffer-metadata - ~ (capacity in bytes, metadata pointer) - dup dup 6 8 * + - ~ (capacity in bytes, metadata pointer, metadata pointer, physical start) - swap buffer-physical-start ! - ~ (capacity in bytes, metadata pointer) - dup 3unroll buffer-physical-length ! - ~ (metadata pointer) - dup clear-buffer ; - - -~ Sets the backing store of an input buffer to point at a null-teriminated -~ string and read from it. -~ -~ (buffer metadata pointer, string pointer --) -: attach-string-to-input-buffer - swap - ~ (string pointer, metadata pointer) - 2dup buffer-physical-start ! - ~ (string pointer, metadata pointer) - 2dup buffer-logical-start ! - ~ (string pointer, metadata pointer) - swap stringlen swap - ~ (string length, metadata pointer) - 2dup buffer-physical-length ! - ~ (string length, metadata pointer) - buffer-logical-length ! ; - - ~ TODO -~ main-input-buffer-metadata 0000001000018738 I raw -~ main-input-buffer 0000001000018788 asm -~ consume-from 00000010000187c0 -~ peek-from 0000001000018960 -~ key-from 0000001000018ab8 -~ is-space 0000001000018b00 -~ peek 0000001000018d20 -~ consume 0000001000018d50 -~ key 0000001000018d88 ~ unroll-past-string 0000001000018db8 ~ swap-past-string 0000001000018ea0 ~ dropstring 0000001000018ee8 ~ dropstring-with-result 0000001000018f80 ~ accumulate-string 0000001000018fc8 +~ is-space 0000001000018b00 ~ word 00000010000194a0 - -~ (string pointer -- entry pointer or 0) -: find latest swap find-in ; +~ The word "'", often pronounced "tick", quotes the following word, looking +~ it up and treating it as a constant. In immediate mode, the constant winds +~ up on the stack; in compile mode it gets compiled. +~ +~ There are a few possible implementation strategies here. Running as an +~ immediate word means there's a clear and unambiguous concept of "the +~ following word", so that's what we do; otherwise we'd have to get clever +~ about somehow finding out where we were called from. That means we take on +~ what would otherwise be the interpreter's responsibility, of checking what +~ mode we're in. Happily, that's easy to do. +~ +~ There's a cyclic dependency where "if" relies on "'", and "'" relies on +~ "if". Fortunately both of them are treated as alternates by the log-load +~ transform, so we don't have to worry about it. +: ' word value@ find dropstring-with-result + interpreter-flags @ 1 & { literal } if + ; make-immediate ~ (character -- 1 for true or 0 for false) @@ -506,11 +252,8 @@ ~ harder for words that attempt to work with the contents of other words. So ~ we give it a name. -s" interpreter-flags-storage" create make-immediate -latest @ unhide-entry -here @ -0 , -s" interpreter-flags" variable +s" interpreter-flags-storage" create make-immediate make-visible +here @ 0 , s" interpreter-flags" variable ~ There's an important bootstrapping concern: If you're loading this ~ interpreter into a running Evocation, it's important to not use the wrong |