diff options
| -rw-r--r-- | dynamic.e | 289 | ||||
| -rw-r--r-- | evoke.e | 11 | ||||
| -rw-r--r-- | flow-control.e | 129 | ||||
| -rw-r--r-- | input.e | 127 | ||||
| -rw-r--r-- | interpret.e | 297 | ||||
| -rw-r--r-- | log-load.e | 12 | ||||
| -rw-r--r-- | transform.e | 71 | ||||
| -rw-r--r-- | vim/syntax/evocation.vim | 2 |
8 files changed, 503 insertions, 435 deletions
diff --git a/dynamic.e b/dynamic.e index 7c7ff13..0adc5c6 100644 --- a/dynamic.e +++ b/dynamic.e @@ -192,13 +192,26 @@ { dup describe next-newer-entry } while drop ; +: describe-compilation + ~ It's always in progress ;) We just need a header like this so it doesn't + ~ get confused with other kinds of debug output. + ." compilation in progress" newline + latest @ hexdump + newline + ." here " here @ .hex64 newline + ." latest " latest @ .hex64 newline + ." name of latest: " latest @ entry-to-name emitstring newline + newline ; + + + ~ Log manipulation ~ ~~~~~~~~~~~~~~~~ ~ In general, we're going to want to be able to go on little excursions ~ where we define utility words that are only useful for one task, then ~ deallocate that stuff after we're done with it. We implement "forget", -~ which removes both dictionary entries and heap allocations for the entry +~ which removes both dictionary entries and log allocations for the entry ~ pointer it's given and everything that came after. ~ ~ The implementation strategy is the same as Jonesforth's version, but @@ -224,147 +237,165 @@ ~ begun). : recurse latest @ entry-to-execution-token , ; make-immediate -~ The word "'", often pronounced "tick", quotes the following word, looking -~ it up and treating it as a constant. In immediate mode, the constant winds -~ up on the stack; in compile mode it gets compiled. -~ -~ There are a few possible implementation strategies here. Running as an -~ immediate word means there's a clear and unambiguous concept of "the -~ following word", so that's what we do; otherwise we'd have to get clever -~ about somehow finding out where we were called from. That means we take on -~ what would otherwise be the interpreter's responsibility, of checking what -~ mode we're in. Happily, that's easy to do. + +~ The implementation of find-in is in log-load.e, for now. ~ -~ Though it might be nice to have high-level flow control for this, our -~ implementation of "if" below relies on "'" several times, whereas "'" only -~ branches once. So we bootstrap "'" first. -~ : ' word value@ find dropstring-with-result -~ interpreter-flags @ 1 & 0branch [ 2 8 * , ] literal -~ ; make-immediate +~ (string pointer -- entry pointer or 0) +: find latest swap find-in ; -~ High-level flow-control -~ ~~~~~~~~~~~~~~~~~~~~~~~ -~ -~ We use a novel suffix-based approach to flow control. We define words -~ { and } which describe the boundaries of blocks of code, leaving a -~ description on the value stack, while still compiling the contents -~ normally. -~ -~ Then follow-up words such as "if" can use that information to slide -~ the blocks around and insert any needed branches and other logic. +~ Allocates bytes on the log by incrementing the global "here" pointer. The +~ "here" pointer is kept aligned to an 8-byte boundary, regardless of the size +~ requested. ~ -~ These words get their own file because they of course have very high -~ importance to bootstrapping, and it's useful to be able to see where they -~ fall in the list of files. +~ This does not create dictionary entries, it's just a raw memory interface. +~ It's suitable for allocating data or scratch space. ~ -~ Both the label transform and the log-load transform go out of their way -~ to make sure these words work. +~ (size -- pointer) +: allocate + here @ dup + ~ (size, here value, here value) + 3roll + 8 packalign here ! ; -~ ~ (-- start pointer) -~ : { here @ ; make-immediate -~ -~ ~ (start pointer -- start pointer, length) -~ : } dup here @ swap - ; make-immediate -~ +~ Allocate space by incrementing "here", and output a word entry header in +~ it. Also add it to the "latest" linked list. Use zero as the flag values; +~ accept a string pointer on the stack and use its contents as the name. ~ -~ ~ (start pointer, length --) -~ : if 2dup swap dup 5 8 * + 3unroll swap +~ This is the first step of creating a new word. Its responsibility includes +~ everything up to the codeword, not including the codeword; it leaves things +~ all set up to start appending contents to the new word by calling ",". ~ -~ ~ (start pointer, length, adjusted start pointer, start pointer, length) -~ memmove -~ ~ (start pointer, length) -~ swap here @ swap here ! swap -~ ~ (old here, length) -~ ' lit entry-to-execution-token , 0 , -~ ' != entry-to-execution-token , -~ ~ The branch length needs to be one word longer than the block length, -~ ~ because the length field itself is part of the scope of the branch. -~ ' 0branch entry-to-execution-token , dup 8 + , -~ ~ (old here, length) -~ drop 5 8 * + here ! ; make-immediate +~ There's a handy diagram of the entry header format under "quick +~ reference", in the description of the exeuction model in exeuction.e. Create +~ is responsible for everything up to the codeword, not including it. ~ +~ When a word is created in interpret mode using s" to provide a string +~ literal, the temporary space that s" uses is in the same place as the +~ entry header we're going to write out. It really is very useful to have +~ that work. Fortunately, it does! We're able to avoid needing a special case +~ by doing things in a very careful way, as described below. ~ -~ ~ (start pointer, length --) -~ : unless 2dup swap dup 5 8 * + 3unroll swap -~ ~ (start pointer, length, start pointer, adjusted start pointer, length) -~ memmove -~ ~ (start pointer, length) -~ swap here @ swap here ! swap -~ ~ (old here, length) -~ ' lit entry-to-execution-token , 0 , -~ ' = entry-to-execution-token , -~ ~ The branch length needs to be one word longer than the block length, -~ ~ because the length field itself is part of the scope of the branch. -~ ' 0branch entry-to-execution-token , dup 8 + , -~ ~ (old here, length) -~ drop 5 8 * + here ! ; make-immediate +~ (string pointer --) +: create + ~ We add one to the string length in order to include the trailing null + ~ terminator. This will be the length of our name field; we save an extra + ~ copy of it to help with packing later. + dup stringlen 1 + dup 3unroll + ~ (name field length, string pointer, name field length) + + ~ We use memmove to put the string in its final position, because it works + ~ correctly when the destination overlaps with the source. Notice that we + ~ do this before writing anything else in the entry header, to avoid + ~ stepping on it. The name string always starts ten bytes into the header, + ~ so we can use a fixed offset. + here @ 10 + 3unroll memmove + ~ (name field length) + + ~ Now we can get back to the fields that belong at the start of the entry + ~ header. We take the value of "here" and keep a working copy of it on the + ~ stack, which we'll advance every time we write more bytes. + here @ + ~ (name field length, updated "here" pointer) + + ~ Pack the old value of "latest" as the first field of the header, linking + ~ from the newly-defined word to the next-newest word. + ~ + ~ All the entries form a linked list, from newest to oldest. Since the + ~ link is the first field in the entry header, you can get from each entry + ~ to the one before it just by dereferencing the entry pointer. + latest @ pack64 + + ~ This is the flags byte. It starts at zero; our caller can change it if + ~ desired. + 0 pack8 + + ~ This is the "other" null terminator, used when traversing the name + ~ string backwards for execution-token-to-entry. Yes, the name is + ~ null-terminated at both ends. + 0 pack8 + + + ~ The name field is already populated, so just skip past it. + ~ (updated "here" pointer) + + ~ The codeword is aligned to a machine-word boundary, and the padding for + ~ it is create's responsibility. + ~ + ~ By adding the null terminator before adding alignment padding, we've + ~ made sure there's always at least one null byte. Otherwise we'd be missing + ~ the terminator if by chance the name were exactly the wrong length. + 8 packalign + ~ (updated "here" pointer) + + ~ Retrieve the value of "here", which still doesn't reflect our additions, + ~ and store it at the adddress of "latest". It's the start of our + ~ newly-defined word, which makes it the latest word. + here @ latest ! + + ~ Finally, we write our updated value of "here" back into the variable. + here ! ; + + +: self-codeword here @ 8 + , ; + + +~ A variable is simply a word that returns a specific address, always the +~ same one, at which a value can be stored. This word "variable" takes and +~ address and a word name, and defines the word. Allocating space is its +~ caller's responsibility. ~ +~ TODO the address is constant but the contents vary, confusing, write it up ~ -~ ~ (true start, true length, false start, false length --) -~ : if-else -~ dup 4 roll dup 5 unroll + -~ ~ -~ ~ First we slide the false-block forward, then the true-block. We slide -~ ~ them both directly into their final positions, leaving space at the start -~ ~ for a test and branch, and space in between for an unconditional branch. -~ ~ Those spaces will take five words, and two words, respectively. So the -~ ~ false-block gets moved by seven words, and the true-block gets moved by -~ ~ five words. -~ 2dup swap dup 7 8 * + swap 3roll memmove -~ 4 roll dup 5 unroll 4 roll dup 5 unroll -~ swap dup 5 8 * + swap 3roll memmove -~ ~ (true start, true length, false start, false length) -~ ~ -~ ~ Now we write out the initial test-and-branch. -~ 4 roll dup 5 unroll here @ 6 unroll here ! -~ ~ (old here, true start, true length, false start, false length) -~ ' lit entry-to-execution-token , 0 , -~ ' != entry-to-execution-token , -~ ~ Branch past the length field, the true-block, and the unconditional -~ ~ branch in the middle. -~ ' 0branch entry-to-execution-token , -~ 3roll dup 4 unroll 3 8 * + , -~ ~ -~ ~ Next, write out the unconditional branch in the middle. -~ swap dup 3unroll 5 8 * + here ! -~ ' branch entry-to-execution-token , -~ ~ Branch past the length field and the false-block. -~ dup 8 + , -~ ~ -~ ~ Set "here" to point to the true end. -~ drop drop drop drop 7 8 * + here ! -~ ; make-immediate +~ (address for new variable word to point to, string pointer --) +: variable + create + self-codeword + here @ + swap :rax mov-reg64-imm64 + :rax push-reg64 + pack-next + 8 packalign + here ! ; + + +~ A keyword is a word that evaluates to its own address, which makes it +~ suitable for use as a constant. By convention, all our keywords have names +~ starting with a colon, which imitates the way they work in Common Lisp. ~ +~ Specifically, it returns its own execution token. Thus, executing its +~ result repeatedly will keep giving the same value. We aren't in the habit of +~ doing quote-exec kinds of things in Evocation, but it seems as good as any +~ other unique value, so we might as well. ~ -~ ~ (start, length --) -~ : forever -~ ' branch entry-to-execution-token , 8 + -1 * , drop -~ ; make-immediate -~ -~ -~ ~ This slides the body forward, leaving the test where it is. It puts a -~ ~ conditional branch in-between them, then appends an unconditional branch -~ ~ at the end. -~ ~ -~ ~ (test start, test length, body start, body length --) -~ : while -~ ~ The conditional branch needs five words. -~ 2dup swap dup 5 8 * + swap 3roll memmove -~ here @ 5 unroll swap dup 3unroll here ! -~ ~ (old here, test start, test length, body start, body length) -~ ' lit entry-to-execution-token , 0 , -~ ' != entry-to-execution-token , -~ ~ Branch past the length field, the body, and the unconditional branch. -~ ' 0branch entry-to-execution-token , -~ dup 3 8 * + , -~ ~ Set "here" to the new end. -~ 5 8 * 6 roll + here ! -~ ~ (test start, test length, body start, body length) -~ ~ Unconditionally branch backwards past the branch word, the body, the -~ ~ conditional branch, and the test. -~ ' branch entry-to-execution-token , -~ 6 8 * + swap drop + swap drop -1 * , -~ ; make-immediate +~ Unlike CL, we don't currently have the lexer automatically create keywords +~ for us; we create them explicitly. That's likely to be added at some point, +~ but at the moment the feature is lying fallow to see whether it winds up +~ seeing a lot of use. ~ +~ (string pointer --) +: keyword + create + + ~ Before outputting our codeword, save a copy of the address where it's + ~ going to be. That will be the execution token we return. + here @ dup + ~ (self execution token, output point) + + ~ Now add a codeword. This is an assembly word, so it's a self-codeword, + ~ meaning it points to the word right after itself. + dup 8 + pack64 + ~ (self execution token, output point) + + ~ Now we consume the execution token, using it as part of this instruction. + :rax mov-reg64-imm64 + ~ (output point) + + ~ To return it, we push it to the stack. + :rax push-reg64 + + ~ Now just the normal stuff every assembly word ends with. + pack-next + 8 packalign + + here ! ; + diff --git a/evoke.e b/evoke.e index c1ccebc..54fe3fb 100644 --- a/evoke.e +++ b/evoke.e @@ -2,7 +2,8 @@ ~ echo 262144 read-to-buffer; \ ~ cat core.e linux.e output.e amd64.e execution-support.e log-load.e; \ ~ echo pyrzqxgl 262144 read-to-buffer; \ -~ cat core.e linux.e output.e amd64.e execution-support.e dynamic.e; \ +~ cat core.e linux.e output.e amd64.e execution-support.e log-load.e \ +~ dynamic.e input.e ; \ ~ echo 0 sys-exit pyrzqxgl; \ ~ cat evoke.e) \ ~ | ./quine > evoke && chmod 755 evoke && ./evoke @@ -10,8 +11,6 @@ s" source-to-copy-to-log" variable s" source-to-precompile" variable -~ (output memory start, current output point -~ -- output memory start, current output point) ~ (output memory start, current output point ~ -- output memory start, current output point) @@ -20,12 +19,18 @@ s" source-to-precompile" variable ~ : all-contents 0x08000000 L!' origin + elf-file-header elf-program-header output-cold-start source-to-copy-to-log output-warm-start source-to-precompile label-transform + + ~ If we wanted words in the log to be able to call statically-linked + ~ words, we could set this to something nonzero. We don't, so we leave it + ~ alone. 0 L!' final-word-name + current-offset L!' total-size 0 L!' : 0 L!' ; diff --git a/flow-control.e b/flow-control.e new file mode 100644 index 0000000..a1b066d --- /dev/null +++ b/flow-control.e @@ -0,0 +1,129 @@ +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ ~~ High-level flow-control ~~ +~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~ +~ We use a novel suffix-based approach to flow control. We define words +~ { and } which describe the boundaries of blocks of code, leaving a +~ description on the value stack, while still compiling the contents +~ normally. +~ +~ Then follow-up words such as "if" can use that information to slide +~ the blocks around and insert any needed branches and other logic. +~ +~ Both the label transform and the log-load transform go out of their way +~ to make sure these words work. Because of that, we actually get to use these +~ before defining them... just keep in mind nothing under the transform is +~ calling THESE versions. + + +~ (-- start pointer) +: { here @ ; make-immediate + +~ (start pointer -- start pointer, length) +: } dup here @ swap - ; make-immediate + + +~ (start pointer, length --) +: if + 2dup swap dup 5 8 * + 3unroll swap + ~ (start pointer, length, adjusted start pointer, start pointer, length) + memmove + ~ (start pointer, length) + + swap here @ swap here ! swap + ~ (old here, length) + + ' lit entry-to-execution-token , 0 , + ' != entry-to-execution-token , + + ~ The branch length needs to be one word longer than the block length, + ~ because the length field itself is part of the scope of the branch. + ' 0branch entry-to-execution-token , dup 8 + , + ~ (old here, length) + + drop 5 8 * + here ! + ; make-immediate + + +~ (start pointer, length --) +: unless 2dup swap dup 5 8 * + 3unroll swap + ~ (start pointer, length, start pointer, adjusted start pointer, length) + memmove + ~ (start pointer, length) + swap here @ swap here ! swap + ~ (old here, length) + ' lit entry-to-execution-token , 0 , + ' = entry-to-execution-token , + ~ The branch length needs to be one word longer than the block length, + ~ because the length field itself is part of the scope of the branch. + ' 0branch entry-to-execution-token , dup 8 + , + ~ (old here, length) + drop 5 8 * + here ! ; make-immediate + + +~ (true start, true length, false start, false length --) +: if-else + dup 4 roll dup 5 unroll + + + ~ First we slide the false-block forward, then the true-block. We slide + ~ them both directly into their final positions, leaving space at the start + ~ for a test and branch, and space in between for an unconditional branch. + ~ Those spaces will take five words, and two words, respectively. So the + ~ false-block gets moved by seven words, and the true-block gets moved by + ~ five words. + 2dup swap dup 7 8 * + swap 3roll memmove + 4 roll dup 5 unroll 4 roll dup 5 unroll + swap dup 5 8 * + swap 3roll memmove + ~ (true start, true length, false start, false length) + + ~ Now we write out the initial test-and-branch. + 4 roll dup 5 unroll here @ 6 unroll here ! + ~ (old here, true start, true length, false start, false length) + ' lit entry-to-execution-token , 0 , + ' != entry-to-execution-token , + ~ Branch past the length field, the true-block, and the unconditional + ~ branch in the middle. + ' 0branch entry-to-execution-token , + 3roll dup 4 unroll 3 8 * + , + + ~ Next, write out the unconditional branch in the middle. + swap dup 3unroll 5 8 * + here ! + ' branch entry-to-execution-token , + ~ Branch past the length field and the false-block. + dup 8 + , + + ~ Set "here" to point to the true end. + drop drop drop drop 7 8 * + here ! + ; make-immediate + + +~ (start, length --) +: forever + ' branch entry-to-execution-token , 8 + -1 * , drop + ; make-immediate + + +~ This slides the body forward, leaving the test where it is. It puts a +~ conditional branch in-between them, then appends an unconditional branch +~ at the end. +~ +~ (test start, test length, body start, body length --) +: while + ~ The conditional branch needs five words. + 2dup swap dup 5 8 * + swap 3roll memmove + here @ 5 unroll swap dup 3unroll here ! + ~ (old here, test start, test length, body start, body length) + ' lit entry-to-execution-token , 0 , + ' != entry-to-execution-token , + ~ Branch past the length field, the body, and the unconditional branch. + ' 0branch entry-to-execution-token , + dup 3 8 * + , + ~ Set "here" to the new end. + 5 8 * 6 roll + here ! + ~ (test start, test length, body start, body length) + ~ Unconditionally branch backwards past the branch word, the body, the + ~ conditional branch, and the test. + ' branch entry-to-execution-token , + 6 8 * + swap drop + swap drop -1 * , + ; make-immediate + diff --git a/input.e b/input.e new file mode 100644 index 0000000..c7093e6 --- /dev/null +++ b/input.e @@ -0,0 +1,127 @@ +~ ~~~~~~~~~~~~~~~~~~~ +~ ~~ Input streams ~~ +~ ~~~~~~~~~~~~~~~~~~~ + +~ (pointer to buffer metadata -- pointer to buffer "physical-start" field) +: buffer-physical-start ; + ~ The physical-start field happens to be the first thing in the metadata, so + ~ this is an nop, but it still exists as a word because having it reduces + ~ confusion. +~ (pointer to buffer metadata -- pointer to buffer "physical-length" field) +: buffer-physical-length 8 + ; +~ (pointer to buffer metadata -- pointer to buffer "logical-start" field) +: buffer-logical-start 2 8 * + ; +~ (pointer to buffer metadata -- pointer to buffer "logical-length" field) +: buffer-logical-length 3 8 * + ; +~ (pointer to input buffer metadata -- pointer to input buffer "refill" field) +: input-buffer-refill 4 8 * + ; +~ (pointer to input buffer metadata +~ -- pointer to input buffer "next-source" field) +: input-buffer-next-source 5 8 * + ; + + +~ Given an initialized buffer (input or otherwise), sets its logical-start +~ and logical-length fields to indicate the buffer is empty. This relies on +~ the buffer having a backing store attached, but does not alter the backing +~ store or its contents. +~ +~ (pointer to buffer metadata --) +: clear-buffer + dup buffer-physical-start @ swap + ~ (address of backing store, metadata pointer) + dup 3unroll + ~ (metadata pointer, address of backing store, metadata pointer) + buffer-logical-start ! + buffer-logical-length 0 swap ! ; + + +~ Sets all fields in an input buffer metadata structure to zero, +~ effectively detaching and leaking any backing store that had been attached +~ to it. Suitable for use during initialization. +~ +~ (pointer to input buffer metadata --) +: zero-input-buffer-metadata + dup buffer-physical-start 0 swap ! + dup buffer-physical-length 0 swap ! + dup buffer-logical-start 0 swap ! + dup buffer-logical-length 0 swap ! + dup input-buffer-refill 0 swap ! + ~ Notice the absence of a dup this time. + input-buffer-next-source 0 swap ! ; + + +~ Allocates input-buffer metadata, with no backing store attached. +~ Initializes the metadata to all zeroes. +~ +~ (-- pointer to input buffer metadata) +: allocate-input-buffer-metadata + 6 8 * allocate + dup zero-input-buffer-metadata ; + + +~ Allocates input buffer metadata and a backing store, in one operation. +~ Points the metadata to the backing store. +~ +~ (buffer capacity in bytes -- pointer to input buffer metadata) +: allocate-input-buffer + dup 6 8 * + allocate + dup zero-input-buffer-metadata + ~ (capacity in bytes, metadata pointer) + dup dup 6 8 * + + ~ (capacity in bytes, metadata pointer, metadata pointer, physical start) + swap buffer-physical-start ! + ~ (capacity in bytes, metadata pointer) + dup 3unroll buffer-physical-length ! + ~ (metadata pointer) + dup clear-buffer ; + + +~ Sets the backing store of an input buffer to point at a null-teriminated +~ string and read from it. +~ +~ (buffer metadata pointer, string pointer --) +: attach-string-to-input-buffer + swap + ~ (string pointer, metadata pointer) + 2dup buffer-physical-start ! + ~ (string pointer, metadata pointer) + 2dup buffer-logical-start ! + ~ (string pointer, metadata pointer) + swap stringlen swap + ~ (string length, metadata pointer) + 2dup buffer-physical-length ! + ~ (string length, metadata pointer) + buffer-logical-length ! ; + + +~ Here we have some imperative code that runs immediately, to initialize +~ some runtime data structures. +~ +~ First, we insert a metadata word header to delimit the space. Otherwise +~ "describe" would crash when attempting to describe +~ "attach-string-to-input-buffer". + +s" main-input-buffer-metadata" create +s" main-input-buffer-metadata" find 0x01 entry-flags! + +~ Having done that, now we do the runtime allocation. Then we also define +~ the variable "main-input-buffer" so we can find it again. +allocate-input-buffer-metadata +s" main-input-buffer" variable + +~ We also initialize the metadata here, pointing it to the boot source as +~ its backing store. We could do that later, but it's convenient to do it +~ here. +~ +~ TODO attach it to something +~ boot-source attach-string-to-input-buffer + + + +~ TODO +~ consume-from 00000010000187c0 +~ peek-from 0000001000018960 +~ key-from 0000001000018ab8 +~ peek 0000001000018d20 +~ consume 0000001000018d50 +~ key 0000001000018d88 diff --git a/interpret.e b/interpret.e index 7db86b7..d2bf10f 100644 --- a/interpret.e +++ b/interpret.e @@ -38,291 +38,37 @@ ~ It is primarily concerned with managing the contents of an area of memory ~ we call the "log"; see log-load.e for more detail on terminology. -~ TODO find a better place for this -: describe-compilation - ~ It's always in progress ;) We just need a header like this so it doesn't - ~ get confused with other kinds of debug output. - ." compilation in progress" newline - latest @ hexdump - newline - ." here " here @ .hex64 newline - ." latest " latest @ .hex64 newline - ." name of latest: " latest @ entry-to-name emitstring newline - newline ; - - -~ Allocate space by incrementing "here", and output a word entry header in -~ it. Also add it to the "latest" linked list. Use zero as the flag values; -~ accept a string pointer on the stack and use its contents as the name. -~ -~ This is the first step of creating a new word. Its responsibility includes -~ everything up to the codeword, not including the codeword; it leaves things -~ all set up to start appending contents to the new word by calling ",". -~ -~ There's a handy diagram of the entry header format under "quick -~ reference", in the description of the exeuction model in exeuction.e. Create -~ is responsible for everything up to the codeword, not including it. -~ -~ When a word is created in interpret mode using s" to provide a string -~ literal, the temporary space that s" uses is in the same place as the -~ entry header we're going to write out. It really is very useful to have -~ that work. Fortunately, it does! We're able to avoid needing a special case -~ by doing things in a very careful way, as described below. -~ -~ (string pointer --) -: create - ~ We add one to the string length in order to include the trailing null - ~ terminator. This will be the length of our name field; we save an extra - ~ copy of it to help with packing later. - dup stringlen 1 + dup 3unroll - ~ (name field length, string pointer, name field length) - - ~ We use memmove to put the string in its final position, because it works - ~ correctly when the destination overlaps with the source. Notice that we - ~ do this before writing anything else in the entry header, to avoid - ~ stepping on it. The name string always starts ten bytes into the header, - ~ so we can use a fixed offset. - here @ 10 + 3unroll memmove - ~ (name field length) - - ~ Now we can get back to the fields that belong at the start of the entry - ~ header. We take the value of "here" and keep a working copy of it on the - ~ stack, which we'll advance every time we write more bytes. - here @ - ~ (name field length, updated "here" pointer) - - ~ Pack the old value of "latest" as the first field of the header, linking - ~ from the newly-defined word to the next-newest word. - ~ - ~ All the entries form a linked list, from newest to oldest. Since the - ~ link is the first field in the entry header, you can get from each entry - ~ to the one before it just by dereferencing the entry pointer. - latest @ pack64 - - ~ This is the flags byte. It starts at zero; our caller can change it if - ~ desired. - 0 pack8 - - ~ This is the "other" null terminator, used when traversing the name - ~ string backwards for execution-token-to-entry. Yes, the name is - ~ null-terminated at both ends. - 0 pack8 - - + ~ The name field is already populated, so just skip past it. - ~ (updated "here" pointer) - - ~ The codeword is aligned to a machine-word boundary, and the padding for - ~ it is create's responsibility. - ~ - ~ By adding the null terminator before adding alignment padding, we've - ~ made sure there's always at least one null byte. Otherwise we'd be missing - ~ the terminator if by chance the name were exactly the wrong length. - 8 packalign - ~ (updated "here" pointer) - - ~ Retrieve the value of "here", which still doesn't reflect our additions, - ~ and store it at the adddress of "latest". It's the start of our - ~ newly-defined word, which makes it the latest word. - here @ latest ! - - ~ Finally, we write our updated value of "here" back into the variable. - here ! ; - -~ (value to append to current word-in-progress --) -: , here @ swap pack64 here ! ; - - -: self-codeword here @ 8 + , ; - - -~ A variable is simply a word that returns a specific address, always the -~ same one, at which a value can be stored. This word "variable" takes and -~ address and a word name, and defines the word. Allocating space is its -~ caller's responsibility. -~ -~ TODO the address is constant but the contents vary, confusing, write it up -~ -~ (address for new variable word to point to, string pointer --) -: variable - create - self-codeword - here @ - swap :rax mov-reg64-imm64 - :rax push-reg64 - pack-next - 8 packalign - here ! ; - - -~ A keyword is a word that evaluates to its own address, which makes it -~ suitable for use as a constant. By convention, all our keywords have names -~ starting with a colon, which imitates the way they work in Common Lisp. -~ -~ Specifically, it returns its own execution token. Thus, executing its -~ result repeatedly will keep giving the same value. We aren't in the habit of -~ doing quote-exec kinds of things in Evocation, but it seems as good as any -~ other unique value, so we might as well. -~ -~ Unlike CL, we don't currently have the lexer automatically create keywords -~ for us; we create them explicitly. That's likely to be added at some point, -~ but at the moment the feature is lying fallow to see whether it winds up -~ seeing a lot of use. -~ -~ (string pointer --) -: keyword - create - - ~ Before outputting our codeword, save a copy of the address where it's - ~ going to be. That will be the execution token we return. - here @ dup - ~ (self execution token, output point) - - ~ Now add a codeword. This is an assembly word, so it's a self-codeword, - ~ meaning it points to the word right after itself. - dup 8 + pack64 - ~ (self execution token, output point) - - ~ Now we consume the execution token, using it as part of this instruction. - :rax mov-reg64-imm64 - ~ (output point) - - ~ To return it, we push it to the stack. - :rax push-reg64 - - ~ Now just the normal stuff every assembly word ends with. - pack-next - 8 packalign - - here ! ; - - -~ Allocates bytes on the heap by incrementing the global "here" pointer. The -~ "here" pointer is kept aligned to an 8-byte boundary, regardless of the size -~ requested. -~ -~ This does not create dictionary entries, it's just a raw memory interface. -~ It's suitable for allocating data or scratch space. -: allocate - here @ dup - ~ (size, here value, here value) - 3roll + 8 packalign here ! ; - : hide-entry dup entry-flags@ 0x80 | entry-flags! ; : unhide-entry dup entry-flags@ 0x80 invert & entry-flags! ; -~ (pointer to buffer metadata -- pointer to buffer "physical-start" field) -: buffer-physical-start ; - ~ The physical-start field happens to be the first thing in the metadata, so - ~ this is an nop, but it still exists as a word because having it reduces - ~ confusion. -~ (pointer to buffer metadata -- pointer to buffer "physical-length" field) -: buffer-physical-length 8 + ; -~ (pointer to buffer metadata -- pointer to buffer "logical-start" field) -: buffer-logical-start 2 8 * + ; -~ (pointer to buffer metadata -- pointer to buffer "logical-length" field) -: buffer-logical-length 3 8 * + ; -~ (pointer to input buffer metadata -- pointer to input buffer "refill" field) -: input-buffer-refill 4 8 * + ; -~ (pointer to input buffer metadata -~ -- pointer to input buffer "next-source" field) -: input-buffer-next-source 5 8 * + ; - -~ Given an initialized buffer (input or otherwise), sets its logical-start -~ and logical-length fields to indicate the buffer is empty. This relies on -~ the buffer having a backing store attached, but does not alter the backing -~ store or its contents. -~ -~ (pointer to buffer metadata --) -: clear-buffer - dup buffer-physical-start @ swap - ~ (address of backing store, metadata pointer) - dup 3unroll - ~ (metadata pointer, address of backing store, metadata pointer) - buffer-logical-start ! - buffer-logical-length 0 swap ! ; - - -~ Sets all fields in an input buffer metadata structure to zero, -~ effectively detaching and leaking any backing store that had been attached -~ to it. Suitable for use during initialization. -~ -~ (pointer to input buffer metadata --) -: zero-input-buffer-metadata - dup buffer-physical-start 0 swap ! - dup buffer-physical-length 0 swap ! - dup buffer-logical-start 0 swap ! - dup buffer-logical-length 0 swap ! - dup input-buffer-refill 0 swap ! - ~ Notice the absence of a dup this time. - input-buffer-next-source 0 swap ! ; - - -~ Allocates input-buffer metadata, with no backing store attached. -~ Initializes the metadata to all zeroes. -~ -~ (-- pointer to input buffer metadata) -: allocate-input-buffer-metadata - 6 8 * allocate - dup zero-input-buffer-metadata ; - - -~ Allocates input buffer metadata and a backing store, in one operation. -~ Points the metadata to the backing store. -~ -~ (buffer capacity in bytes -- pointer to input buffer metadata) -: allocate-input-buffer - dup 6 8 * + allocate - dup zero-input-buffer-metadata - ~ (capacity in bytes, metadata pointer) - dup dup 6 8 * + - ~ (capacity in bytes, metadata pointer, metadata pointer, physical start) - swap buffer-physical-start ! - ~ (capacity in bytes, metadata pointer) - dup 3unroll buffer-physical-length ! - ~ (metadata pointer) - dup clear-buffer ; - - -~ Sets the backing store of an input buffer to point at a null-teriminated -~ string and read from it. -~ -~ (buffer metadata pointer, string pointer --) -: attach-string-to-input-buffer - swap - ~ (string pointer, metadata pointer) - 2dup buffer-physical-start ! - ~ (string pointer, metadata pointer) - 2dup buffer-logical-start ! - ~ (string pointer, metadata pointer) - swap stringlen swap - ~ (string length, metadata pointer) - 2dup buffer-physical-length ! - ~ (string length, metadata pointer) - buffer-logical-length ! ; - - ~ TODO -~ main-input-buffer-metadata 0000001000018738 I raw -~ main-input-buffer 0000001000018788 asm -~ consume-from 00000010000187c0 -~ peek-from 0000001000018960 -~ key-from 0000001000018ab8 -~ is-space 0000001000018b00 -~ peek 0000001000018d20 -~ consume 0000001000018d50 -~ key 0000001000018d88 ~ unroll-past-string 0000001000018db8 ~ swap-past-string 0000001000018ea0 ~ dropstring 0000001000018ee8 ~ dropstring-with-result 0000001000018f80 ~ accumulate-string 0000001000018fc8 +~ is-space 0000001000018b00 ~ word 00000010000194a0 - -~ (string pointer -- entry pointer or 0) -: find latest swap find-in ; +~ The word "'", often pronounced "tick", quotes the following word, looking +~ it up and treating it as a constant. In immediate mode, the constant winds +~ up on the stack; in compile mode it gets compiled. +~ +~ There are a few possible implementation strategies here. Running as an +~ immediate word means there's a clear and unambiguous concept of "the +~ following word", so that's what we do; otherwise we'd have to get clever +~ about somehow finding out where we were called from. That means we take on +~ what would otherwise be the interpreter's responsibility, of checking what +~ mode we're in. Happily, that's easy to do. +~ +~ There's a cyclic dependency where "if" relies on "'", and "'" relies on +~ "if". Fortunately both of them are treated as alternates by the log-load +~ transform, so we don't have to worry about it. +: ' word value@ find dropstring-with-result + interpreter-flags @ 1 & { literal } if + ; make-immediate ~ (character -- 1 for true or 0 for false) @@ -506,11 +252,8 @@ ~ harder for words that attempt to work with the contents of other words. So ~ we give it a name. -s" interpreter-flags-storage" create make-immediate -latest @ unhide-entry -here @ -0 , -s" interpreter-flags" variable +s" interpreter-flags-storage" create make-immediate make-visible +here @ 0 , s" interpreter-flags" variable ~ There's an important bootstrapping concern: If you're loading this ~ interpreter into a running Evocation, it's important to not use the wrong diff --git a/log-load.e b/log-load.e index 1406270..2b7cbd2 100644 --- a/log-load.e +++ b/log-load.e @@ -125,6 +125,8 @@ ~ "find". The warm-start routine (see execution.e and transform.e) has the ~ job of fixing that, and it makes extensive use of find-in to do so. ~ +~ TODO this probably deserves its own file? +~ ~ (dictionary pointer, string pointer -- entry pointer or 0) : find-in ~ It will be more convenient to have the entry pointer on top. @@ -210,7 +212,7 @@ } if-else ; -~ This is the same as "create", from interpret.e, except that it takes the +~ This is the same as "create", from dynamic.e, except that it takes the ~ log's address as a parameter rather than hardcoding it, so that it can be ~ used in situations where the normal compilation process isn't yet available. ~ @@ -245,7 +247,7 @@ over log-load-here swap drop ! ; -~ This is the same as ",", from interpret.e, except that it takes the log's +~ This is the same as ",", from dynamic.e, except that it takes the log's ~ address as a parameter rather than hardcoding it, so that it can be used in ~ situations where the normal compilation process isn't yet available. ~ @@ -262,7 +264,7 @@ ! ; -~ This is the same as `;asm`, from interpret.e, except that it takes the +~ This is the same as `;asm`, from dynamic.e, except that it takes the ~ log's address as a parameter rather than hardcoding it, so that it can be ~ used in situations where the normal compilation process isn't yet available. ~ @@ -287,7 +289,7 @@ dup 8 + swap ! ; -~ This is the same as "variable", from interpret.e, except that it takes the +~ This is the same as "variable", from dynamic.e, except that it takes the ~ log's address as a parameter rather than hardcoding it, so that it can be ~ used in situations where the normal compilation process isn't yet available. ~ @@ -314,7 +316,7 @@ ~ A keyword is a word that evaluates to its own address, which makes it -~ suitable for use as a constant. See more detail on that in interpret.e, +~ suitable for use as a constant. See more detail on that in dynamic.e, ~ where "keyword" is defined. ~ ~ Unlike Common Lisp, the lexer doesn't create keywords for us, we have to diff --git a/transform.e b/transform.e index d917601..61ef373 100644 --- a/transform.e +++ b/transform.e @@ -712,11 +712,33 @@ allocate-transform-state s" transform-state" variable dup s" describe-docol" stringcmp 0 = { drop -1 exit } if dup s" describe" stringcmp 0 = { drop -1 exit } if dup s" describe-all" stringcmp 0 = { drop 0 exit } if + dup s" describe-compilation" stringcmp 0 = { drop 0 exit } if dup s" forget" stringcmp 0 = { drop -1 exit } if dup s" ," stringcmp 0 = { drop -1 exit } if dup s" make-immediate" stringcmp 0 = { drop 0 exit } if dup s" make-hidden" stringcmp 0 = { drop 0 exit } if dup s" make-visible" stringcmp 0 = { drop 0 exit } if + dup s" recurse" stringcmp 0 = { drop 0 exit } if + dup s" recurse" stringcmp 0 = { drop 0 exit } if + dup s" find" stringcmp 0 = { drop 0 exit } if + dup s" allocate" stringcmp 0 = { drop 0 exit } if + dup s" create" stringcmp 0 = { drop -1 exit } if + dup s" self-codeword" stringcmp 0 = { drop 0 exit } if + dup s" variable" stringcmp 0 = { drop -2 exit } if + dup s" keyword" stringcmp 0 = { drop -1 exit } if + + ~ From input.e. + dup s" buffer-physical-start" stringcmp 0 = { drop 0 exit } if + dup s" buffer-physical-length" stringcmp 0 = { drop 0 exit } if + dup s" buffer-logical-start" stringcmp 0 = { drop 0 exit } if + dup s" buffer-logical-length" stringcmp 0 = { drop 0 exit } if + dup s" input-buffer-refill" stringcmp 0 = { drop 0 exit } if + dup s" input-buffer-next-source" stringcmp 0 = { drop 0 exit } if + dup s" clear-buffer" stringcmp 0 = { drop -1 exit } if + dup s" zero-input-buffer-metadata" stringcmp 0 = { drop -1 exit } if + dup s" allocate-input-buffer-metadata" stringcmp 0 = { drop 1 exit } if + dup s" allocate-input-buffer" stringcmp 0 = { drop 0 exit } if + dup s" attach-string-to-input-buffer" stringcmp 0 = { drop -2 exit } if ~ Created by warm-start in execution.e. dup s" log" stringcmp 0 = { drop 1 exit } if @@ -758,7 +780,7 @@ allocate-transform-state s" transform-state" variable ~ This is the alternate version of "create" for use with the label ~ transform. Its code is the same as the regular "create" except as noted ~ below. It is likely to be extremely useful to read and understand "create" -~ in interpret.e before attempting to understand label-create-alternate. +~ in dynamic.e before attempting to understand label-create-alternate. : label-create-alternate dup stringlen 1 + dup 3unroll here @ 10 + 3unroll memmove @@ -788,7 +810,7 @@ allocate-transform-state s" transform-state" variable ~ This is the alternate version of ":" for use with the label transform. Its ~ code is the same as the regular "create" except as noted below. It is likely -~ to be extremely useful to read and understand ":" in interpret.e before +~ to be extremely useful to read and understand ":" in dynamic.e before ~ attempting to understand label-colon-alternate. : label-colon-alternate ~ This calls label-create-alternate instead of "create". @@ -806,7 +828,7 @@ allocate-transform-state s" transform-state" variable ~ This is the alternate version of ";" for use with the label transform. Its ~ code is the same as the regular "create" except as noted below. It is likely -~ to be extremely useful to read and understand ";" in interpret.e before +~ to be extremely useful to read and understand ";" in dynamic.e before ~ attempting to understand label-semicolon-alternate. : label-semicolon-alternate ~ This looks up "exit" by label. @@ -825,7 +847,7 @@ allocate-transform-state s" transform-state" variable ~ This is the alternate version of ";asm" for use with the label transform. ~ Its code is the same as the regular "create" except as noted below. It is -~ likely to be extremely useful to read and understand ";asm" in interpret.e +~ likely to be extremely useful to read and understand ";asm" in dynamic.e ~ before attempting to understand label-semicolon-assembly-alternate. : label-semicolon-assembly-alternate here @ pack-next 8 packalign here ! @@ -1092,7 +1114,7 @@ allocate-transform-state s" transform-state" variable ~ This implements the label transform for a single word. It is directly -~ analogous to "interpret", and reading interpret.e may help in understanding +~ analogous to "interpret", and reading dynamic.e may help in understanding ~ it, though it's meant to still make sense on its own. ~ ~ It expects to be called from "label-transform", below, which loops. @@ -1292,7 +1314,7 @@ allocate-transform-state s" transform-state" variable ~ This implements the label transform for all words in a region given as an -~ input string. It is directly analogous to "quit", in interpret.e, but is far +~ input string. It is directly analogous to "quit", in interpet.e, but is far ~ more complex. ~ ~ (output buffer start, output point, input string pointer @@ -1594,26 +1616,34 @@ allocate-transform-state s" transform-state" variable ~ It's worth keeping in mind that this alternate only gets called for ~ manual invocations of "create". It isn't called from the colon alternate. : log-load-create-alternate - log-load-roll-log-address + ~ In immediate mode, we have special behavior because this is fundamental + ~ to bootstrapping the log. In compile mode, we compile a dynamic call, + ~ which is the same thing that would happen if we didn't have an alternate + ~ at all. + interpreter-flags @ 0x01 & { + s" ," log-load-compile-dynamic-word + } { + log-load-roll-log-address - swap-transform-variables - L@' log-load-create - L@' swap - swap-transform-variables + swap-transform-variables + L@' log-load-create + L@' swap + swap-transform-variables - ~ The overall stack delta of this sequence is 0. - offset-to-target-address-space , ~ swap - offset-to-target-address-space , ~ log-load-create + ~ The overall stack delta of this sequence is 0. + offset-to-target-address-space , ~ swap + offset-to-target-address-space , ~ log-load-create - ~ We've consumed a string pointer from the stack, so that's a delta of -1. - -1 transform-apply-stack-delta + ~ We've consumed a string pointer from the stack, so that's a delta of -1. + -1 transform-apply-stack-delta - log-load-unroll-log-address ; + log-load-unroll-log-address + } if-else ; ~ This is the alternate version of ":" for use with the log-load transform. ~ Its code is the same as the regular ":" except as noted below. It is likely -~ to be extremely useful to read and understand ":" in interpret.e before +~ to be extremely useful to read and understand ":" in dynamic.e before ~ attempting to understand "log-load-colon-alternate". : log-load-colon-alternate word value@ @@ -1652,7 +1682,7 @@ allocate-transform-state s" transform-state" variable ~ This is the alternate version of ";" for use with the log-load transform. ~ Its code is the same as the regular ";" except as noted below. It is -~ likely to be extremely useful to read and understand ";" in interpret.e +~ likely to be extremely useful to read and understand ";" in dynamic.e ~ before attempting to understand "log-load-semicolon-alternate". : log-load-semicolon-alternate ~ We generate code that looks up "exit" by name and appends it to the @@ -1685,7 +1715,7 @@ allocate-transform-state s" transform-state" variable ~ This is the alternate version of ";asm" for use with the log-load ~ transform. Its code is the same as the regular "create" except as noted ~ below. It is likely to be extremely useful to read and understand ";asm" in -~ interpret.e before attempting to understand "log-load;asm". +~ dynamic.e before attempting to understand "log-load;asm". : log-load-semicolon-assembly-alternate ~ We generate code that statically invokes log-load-semicolon-assembly, ~ which does all the actual work. There's quite a few steps, so it makes @@ -2098,6 +2128,7 @@ allocate-transform-state s" transform-state" variable swap drop ' log-load-left-square-brace-alternate swap } if dup s" ]" stringcmp 0 = { swap drop ' log-load-right-square-brace-alternate swap } if + dup s" '" stringcmp 0 = { swap drop ' log-load-tick-alternate swap } if dup s" ," stringcmp 0 = { swap drop ' log-load-comma-alternate swap } if dup s" keyword" stringcmp 0 = { swap drop ' log-load-keyword-alternate swap } if diff --git a/vim/syntax/evocation.vim b/vim/syntax/evocation.vim index 62042ae..bb5094f 100644 --- a/vim/syntax/evocation.vim +++ b/vim/syntax/evocation.vim @@ -44,7 +44,7 @@ syn match evocationArithmetic "\(^\|\s\)\zs\(negate\|max\|min\)\ze\($\|\s\)" syn match evocationBlock "\(^\|\s\)\zs[\[\]{};]\ze\($\|\s\)" syn match evocationBlock "\(^\|\s\)\zs;asm\ze\($\|\s\)" -syn match evocationFlow "\(^\|\s\)\zs\(if\|unless\|if-else\|while\|forever\|exit\|make-immediate\|make-hidden\)\ze\($\|\s\)" +syn match evocationFlow "\(^\|\s\)\zs\(if\|unless\|if-else\|while\|forever\|exit\|make-immediate\|make-hidden\|make-visible\)\ze\($\|\s\)" syn match evocationConstantWord "\(^\|\s\)\zs:\S\+\ze\($\|\s\)" |