Marpa

Timeline
Login

Timeline

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

545 check-ins using file tests/grammars/discards/err-g1-definition/slif version 293560b5e0

2018-09-22
00:01
Added larger doctools examples from Tcllib. Modified dt helper to show new statistics. Untested. Leaf check-in: 3dbe995b99 user: andreask tags: io-statistics
2018-09-21
22:26
Completed IO statistics for rt-C. Fixed name clash in the new code (existing type vs new function, function renamed). Updated tests to match generator changes (cparse, clex). check-in: aa1749989d user: andreask tags: io-statistics
00:00
Added state and methods to collect and return IO statistics (#streams, #processed, #input). Implemented for rt-Tcl. Added tests. rt-C placeholders. check-in: 23a1ea88eb user: andreask tags: io-statistics
2018-09-20
01:54
Merge staging into main stream. Basic parse events (io, lexer) done. Basic input extension done. Doctools example working (macros, includes) json blocked out (issues with partial utf, non-utf, non-standard codings - Tcl vs. C pre-handling, strings vs. file) Leaf check-in: 7337a35e08 user: aku tags: trunk
2018-09-18
17:18
Extended multi-byte char counting test to full unicode range. Closed-Leaf check-in: 15feed5264 user: aku tags: trunk-stage
05:01
Fix trailing whitespace check-in: d336725a4a user: aku tags: trunk-stage
05:00
Fixed miscounting of multi-byte characters in rtC. Reworked `inbound_step()`. Changed to test a character for correctness on the first byte, then arrange to (not) step the character location on the trailer bytes following, if any. Always steps the character location on the first byte. Updates the byte/char index the same time. Further reworked and fixed the interface from `gate` to `inbound` for rewinding after a rejection (straight end, or when the matched lexeme is actually shorter than the number of bytes consumed before stopping). The used API (`moveby`) was character based, which caused us to rewind to far, namely to the previous character. No trouble when it was a single-byte character. Bad for a multi-byte however. This could then trip the lexer if the first byte of the previous character is not acceptable at the given point. check-in: 44fee72091 user: aku tags: trunk-stage
2018-09-17
23:43
Added test demonstrating that rtC miscounts multi-byte character locations (because counting at end). Added tools to help with generation of test input check-in: 9337c00616 user: aku tags: trunk-stage
23:41
Tweaked test parser gen to leave a log file behind on grammar parse failure check-in: 1694246f20 user: aku tags: trunk-stage
2018-09-15
04:46
Trim trailing whitespace check-in: e64b244e12 user: aku tags: trunk-stage
2018-09-10
23:05
Added runtime tests. Note: These look to need some of the fixes on trunk-stage. Leaf check-in: 8b8a1cedf6 user: andreask tags: non-utf-bytes
22:23
Started implementation of non-utf byte support. check-in: 2d87d5c92b user: andreask tags: non-utf-bytes
2018-09-09
06:00
Fixed mindt|doctools crashes. Two for asserts. Problem with `mindt` was in the new `gather_events` function. It was not fully adapted to the changed data structures, with the split trigger and event arrays. It saved the loop index as event id, which was correct for the old structures. With the new structures OTOH the event id is a field in the array we iterate over instead. Problem with `doctools` was as above, plus also an unknown mismanagement of the `always` array by the shared rtC generator code for LTM-mode lexemes, i.e LATM=false. It triggered a similar assertion on lexer setup after the fix above was applied. `mindt` avoided that issue because it did not use LTM-mode lexemes for its special forms. Reactivated the disabled tests. Passing now. Note: json still disabled. check-in: cd54178989 user: aku tags: trunk-stage
00:02
Integrated doctools example - tests - remeta Regenerated doctools example parser to match runtime changes. Disabled tests with known problems - json (char mislocation) - mindt (crash! (rtc, mis-capacity for some symset)) - doctools (crash! (rtc, mis-capacity for some symset)) check-in: cba29d80fc user: aku tags: trunk-stage
2018-09-08
22:16
Integrate base event work. Stage before true merge. check-in: ecf881268e user: aku tags: trunk-stage
22:15
Integrate base doctools example. Closed-Leaf check-in: 77d4610d6a user: aku tags: parse-events
22:11
Brought trunk and event work. Closed-Leaf check-in: d912ded2f3 user: aku tags: language-doctools
22:07
Brought trunk work check-in: 6ceda1b56e user: aku tags: parse-events
21:38
Tkt [d9b47bbb78] completed check-in: f97a8147f6 user: aku tags: parse-events
09:04
Bring in the extended test suite which runs the example tests too. check-in: a48220cd05 user: aku tags: trunk
04:23
More integration of the language examples into the main - Run example tests with the main - Regenerate example parsers with the main parsers Updated a few tests. Note: json tests have issues. Parse location differences. Investigate. Tracked the location differences to commit [dd2dfa9d07530d8d] where definition of lexeme start switched from byte-based LEX_START to character-based LEX_START_C. This commit is on branch [parse-events-rtc]. I suspect that the stepping of character locations is incorrect. Looking at the `step()` function I see the character increment done for the last byte of the character. This is ok for the singletons. For multi-byte characters OTOH it looks to be too late. We should step at the first byte of a character. Complicating things, partial characters, i.e. missing trailers, unexpected trailers. json is good to find these issues, its test suite contains many of these cases. Fix will be done on new branch. This commit is not responsible, can be merged. Closed-Leaf check-in: 26a81da805 user: aku tags: example-integration
2018-09-07
23:12
Integrated the language example tests into the main. Added the language example parsers to the `remeta` tool to ensure that they are updated after runtime API changes too, not just the system parsers. Dropped the replicated support files, redirected test suites to the main support as the only support. Note: The json example has issues, major testsuite fail. Bisect where this started and fix. - Looks to be lexeme location information differences between rtC and rtTcl. - Might be fixed in the parse-event branch. Leaf check-in: 67d5a8317b user: andreask tags: mistake, example-test-integration
19:05
Fixed handling of `args` parameter in wrong#args error messages. The text is dependent on core version. -- There is more going on -- Critcl differences, Core differences -- Try later again -- Closed-Leaf check-in: 7dbbfd79f5 user: andreask tags: mistake
04:42
The code generating symbol names for literals was still subject on the exact content of the [:control:] charclass, causing differences in output depending on the version of Tcl used (seen between 8.5.18 and 8.5.19). Now ensured version-independent quoting in all the relevant places. Fixed order of conditions too. Cross-referenced these places. Regenerated the internal parsers (slif, literals). Updated tests. check-in: e71840b49a user: aku tags: trunk
2018-09-06
19:29
Fix missed bogus `static const` in test results. Closed-Leaf check-in: 3b692749d7 user: andreask tags: tkt-d9b47bbb78-event-management
16:32
Ticket [d9b47bbb78]. Fix event management in the generators. C runtime and generator updated for the new data structures. Tests updated. check-in: e2021e3ef5 user: aku tags: tkt-d9b47bbb78-event-management
2018-09-04
19:53
Ticket [d9b47bbb78]. Fix event management in the generators. Tcl runtime and generator updated for the new data structures. Tests updated. C runtime tests broken. Next: C runtime and generator. check-in: 69f8e6628c user: aku tags: tkt-d9b47bbb78-event-management
2018-08-31
23:12
Ticket [d9b47bbb78]. Fix event management in the generators. Extended to fix the event data structures in general. - Modified container, grammar base, grammar classes - Event information split into declaration with state (events), and usage information (triggers). Events are managed in the main container class. Triggers are managed in the grammar base class. Code is common to L0 and G1. API used by the semantics is unchanged. Serialization structure changes. Event section is global. Trigger sections under L0, G1. Used a purpose-written tool to rewrite the old serializations in tests results (gcstate, gcr_*) to the new form. Reworked the SLIF generator to use the new form of serialization. Tests pass, no results changed. Fixed testsuite oops (generatec_gc(c|tcl)) not seen in trunk because everything with gc also had gc_c, etc. GC generators are ok as well. Extended testsuite to cover grammars with events. check-in: 5b0bdf9591 user: andreask tags: tkt-d9b47bbb78-event-management
19:46
Ticket [d9b47bbb78]. Fix event management in the generators. rt-C generator ... scrap ... Working on this makes me realize that my data structures are still not right. Re-reading the documentation at https://metacpan.org/pod/distribution/Marpa-R2/pod/Event.pod I came back with the impression that the user-facing event names are indeed in a single scope. It is not spelled out explicitly, but I can't see it work if not. Especially with runtime (de)activation of events it must be. Which means that having separate structures for G1 and L0 is wrong. The core event information (name -> activation) status must be shared. The trigger tables can be separate, should be even. Saving this for posterity, and restarting (from the container outward) Closed-Leaf check-in: 5bf97382b1 user: andreask
19:36
Ticket [d9b47bbb78]. Fix event management in the generators. rt-Tcl generator updated. Tests updated. Split the more complex internal formatting commands into more composable pieces, and re-composed them. check-in: 83c23635b6 user: andreask
07:39
Ticket [d9b47bbb78]. Fix event management in the generators. Tests fully updated. Started update of generators. - slif ok - tparse in the works check-in: d4619268cc user: aku
2018-08-30
23:32
Ticket [d9b47bbb78]. Fix event management in the generators. Changed container data structures. Separated event definition from event usage by symbols (triggers). Attention: This changes the grammar serialization. Started update of tests. TODO: Update generators. check-in: 8c77fd896a user: andreask
21:21
Ticket [d9b47bbb78]. Fix event management in the generators. Prep work, a test. As part of the expected results this also contains an implied draft spec for various changed data structures (engine, slif container), and semantic processing. check-in: 4cd32b9b44 user: andreask tags: mistake
06:03
Merge latest to doctools work check-in: 783110305c user: aku tags: language-doctools
06:02
Merge latest to event work check-in: 783f0feb8b user: aku tags: parse-events
06:01
Merge work back to main. check-in: cf5be67f4a user: aku tags: trunk
06:00
Completed the pass. Closed-Leaf check-in: ef8ad60e6d user: aku tags: test-cleanup
00:07
Start some cleanup of testsuite, some simplifications for var handling of the loop helpers ... check-in: 4f1c1e6f02 user: andreask tags: test-cleanup
2018-08-29
06:59
Updated main line with continued event work. check-in: 36533140c7 user: aku tags: trunk
06:56
Updated example check-in: c5b2707fc3 user: aku tags: language-doctools
06:54
Completed work on the ticket, integrated with base. check-in: 16d33a63ac user: aku tags: parse-events
06:53
Added test for parser reset. Closed-Leaf check-in: 31becc141c user: aku tags: tkt-a5abdaacd4-shift-test-reset
2018-08-28
23:57
Document i-gen a bit more. Additional helpers for one-shot grammar tests outside of the big loops. check-in: d8bfe7d142 user: andreask tags: tkt-a5abdaacd4-shift-test-reset
22:37
Ticket [a5abdaacd4]. Shift engine reset to start of parsing. Keep state for introspection (statistics, progress reports, etc.) post parse. Needs testcases, before/after introduction of reset. check-in: 230aebacc3 user: andreask tags: tkt-a5abdaacd4-shift-test-reset
05:33
Get trunk and doc work into current example. check-in: 6563481eae user: aku tags: language-doctools
05:32
Pull doc work into official code. check-in: 1e2d209efa user: aku tags: trunk
04:40
Final grammar fixes based on processing the Tcllib manpages. 100% pass (429 pages. Note, three down in the total, had include files caught in the simple find) - Allow backslash in general text (and examples). - Allow continuation lines in markup commands. - Allow whitespace after the last element of `tclwords` sequences. - Fixed oops in the include file handling, failed to process the last character of the included file. This was ok when it was a closing newline. Bad for files without such, i.e. ending in the closing bracket of a command. - Allow section command with two arguments. Same as subsection. - Allow 'args' as alias for 'arguments' (list type). - Tweak character escapes, recognize double-backslash. - Allow empty list elements (No text between element markers) - Paragraph sequences are improper (allow tailing separator) Fixed ca 20 tcllib manpages which had issues the older doctools system is not catching (para before first list item, bad closing brackets, silent swallowing of bogus markup commands, ...). check-in: 4b364c62cf user: aku tags: language-doctools
2018-08-26
04:52
Added documentation for package `marpa::util`. Timing/Sequencing missing. Regenerated embedded docs. Leaf check-in: d8022c427d user: aku tags: more-documentation
2018-08-25
20:39
Completed documentation for `marpa::unicode. Regenerated embedded docs to pull in this and previous changes. Completed ticket [a82dc50647]. check-in: 2f04a2d613 user: aku tags: more-documentation
05:05
More filling in of the manpage for `marpa::unicode`. check-in: ac96c4984d user: aku tags: more-documentation
2018-08-24
23:57
Started to document marpa::unicode package check-in: d590007f37 user: andreask tags: more-documentation
21:29
Doc work: Placed project name, titles and uris into variables for shorter references, and easier changes in the future. check-in: d390e27e1e user: andreask tags: trunk
06:26
More grammar fixes based on processing the Tcllib manpages. Implemented a lexeme barrier (lexer flush) for use in stop event handlers. Forces the lexeme sequence for the included file or similar to coincide with the end of that secondary input, and everything going up to the border flushed, so that it will be crossed without any rewind after the last lexeme. Prevents rewinding in the outer input stream after returning from the include, causing very bogus lexing. Note, this only a lexer flush, without a parser eof. Updated doctools-base to use the new barrier in its include management. Grammar changes/fixes: - Dropped the special forms fully from the G1. - Configured the special forms for LTM, i.e. to always match. - Allow use of 'enum' as a list type, vs. just the official 'enumerated'. - Allow opening brace in example text, and quoted strings. - Fixed `subsection` syntax, may have two arguments. This fixed an issue with seeing include or vset in places not expected by the G1 from before the change. Now they can appear everywhere between lexemes. Fixed rt-Tcl's handling of LTM (= always-on) lexemes. Bad id conversion in a few places. Statistics: - 355 pass 82% - 77 fail 18% - 432 total 100% check-in: 21eb6e74f0 user: aku tags: language-doctools
2018-08-23
07:07
More grammar fixes based on processing the Tcllib manpages. Fixed `braced` vs `simple`. Fixed `nbsimplex` vs `simple`. (G1 ambiguities due to L0 lexeme language overlap) Fixed use of `subsections` in pseudo-section `description`. Allowed. Eight combinations for paragraphs, subsections, and sections. Added missing `manpage` markup command. Fixed handling of empty include files. Shortcircuit entry and return. Statistics: - 299 pass 69% - 133 fail 31% - 432 total 100% check-in: 417bd724ce user: aku tags: language-doctools
00:00
Trialing real-work manpages from Tcllib. Fixed issue in grammar around '\[' escapes. Removed `\` from ordinary set, treat separate, with \[, \]. Extended the `dt` helper to operate on multiple files, reference files. Added options `--help`, `-o`, and `-t` to `dt`. Extended `doctools::base` with a limited debug stream to show include processing (files, file size, returns). Tweaks in rt-Tcl parser's narrative tracing. Ditto in rt-Tcl lexer. Fixed ambiguity `simple` vs `simplex`. Because the latter was not strictly longer than the former all g_text's with a single simple were also simplex, for 2^n blowup. Making `simplex` strictly longer (i.e. at least 2 simple in the series) this is gone. TODO: `simple` vs `braced` ambiguity where `simple` accepts `{...}` while it should not. That is a simple (sic!) grammar bug. check-in: a9c57a9594 user: andreask tags: language-doctools
2018-08-22
05:08
Added doctools helper application (`dt`) to show AST for arbitrary doctools input. Added demo tool. Refers to bogus example at the moment. check-in: 39ce9e09a5 user: aku tags: language-doctools
05:07
Fix use of `mymethod`, use direct `my`. check-in: a4eb59f54f user: aku tags: language-doctools
05:01
Fix path resolution for includes. Fully normalize all base paths. check-in: b2f700d6b7 user: aku tags: language-doctools
05:00
Fix missing requirements check-in: e708c6d8da user: aku tags: language-doctools
04:59
Grammar fixes - Allow `vset` (reference) at word level, not just as markup. (Should trial if `latm 0` would work for the special forms) - Kill multiple levels of quantification in various L0 symbols. This is just ambiguity where a series of N elements generates `partition(N)` (*) trees for the same lexeme. TODO: Write semantic checks for a SLIF container to detect and warn about this. (*) https://en.wikipedia.org/wiki/Partition_(number_theory) https://oeis.org/A000041 Very much beyond exponential. check-in: 32457c306a user: aku tags: language-doctools
04:51
C code generator tweaks and fixes: - Fix the issue with the bad indentation of per-symbol semantic information. New array configuration flag: `align`. Default true. - Change lexeme map for events to proper tabular. New array configuration flag: `padright`. Default false (i.e. pad left). check-in: b719ce15a0 user: aku tags: language-doctools
2018-08-21
23:51
Brought part of the `mindt` example, and associated experiences into the doctools example. Reworked the doctools grammar to properly handle the special forms. Further reworked to give it a structure similar to the `mindt` grammar. Started updating the tests. Oops! There is strong ambiguity in the innards of `tclword`, the G1 level sub-structuring of command arguments. It happens because at G1 level we have invisible whitespace discards, and so a text like `BBB CCC` can be parsed as two separate `unquoted` arguments (good), or as a __single__ `unquoted` argument of two elements (bad) because the spacing separating them went away at L0 level. Something similar happens in the definition for `quoted`. `braced` is fine because it maps to a single unbroken lexeme. This needs more thought ... Currently I only see bringing the argument spacing into the G1 as possible solution. The lexeme match would have precedence over the discards. Oh, and `quoted` already has `space` as element of the structure, so maybe it would not have the issue. That would be similar to the special forms, they handle their inner whitespace explicitly too, although in L0. ... A quick trial looks to be working ... Only forest remaining looks to be `breaker` vs `space` in example blocks. Fixed the ambiguity as well. Introduced non-breaking (nb) space, allow only that in the example text. Use non-breaking simplex variants as well to gather longer chunks of input. This looks good. Tests all ok, and ASTs look good enough. I may introduce the special action `::evens` however to drop unwanted spaces in the `tclwords` sub-tree. Explicit removal of the separators. Tests/examples for variable handling and includes have priority tough. Also, get larger/real-world examples (Tcllib, Critcl, Kettle, Kinetcl, etc). check-in: ea5192243e user: andreask tags: language-doctools
17:50
Brought information from mini-dt and the new features for IO and parse events into the main doctools example. check-in: 8a648972d9 user: andreask tags: language-doctools
17:37
Integrated multi-stop support as new feature into parse-events. Mini doctools examples demonstrates usefulness (and worked out how to handle vset/include for doctools ;). check-in: 27f7f4624d user: andreask tags: parse-events
17:32
Integrated mini-dt into the branch it came from, multi-stop handling. Closed-Leaf check-in: 4a80913963 user: andreask tags: multi-stop
17:29
Tweaked comments in the mindt engines a bit. Reworked the class structure to better hide the internal composition with event handling and multiple parsers from the tests (and users). Updated test setup code. Added notes documenting the class structure. Tests themselves pass without change. The IO changes for from/to look to be a success. Removed the scratch IO logs from then the problems with overrun were detected. Closed-Leaf check-in: 8183573f2f user: andreask tags: mini-dt
07:54
Fixed the higher layers, adapted to the changes in inbound. Updated tests. Still have to document the higher match location methods. check-in: 5f55575b22 user: aku tags: mini-dt
2018-08-20
23:16
Reworked API of low-level inbound, documented start/stop location, and limit meanings. Updated tests. TODO: Update higher layers. check-in: 872ecc83c7 user: andreask tags: mini-dt
2018-08-17
23:32
Tentative change to inbound for `stop after a character`. With controlling condition at a place in the loop where it can `stop before the first character` also, giving the full range. Condering to tweak the user-visible api to still mean `stop before character`, as a shifted form of the above, so that all user-specified locations are >= 0, never negative. This work will move to its own branch off `parse-events`, to work it out completely, including internal docs before continuing on the (min)doctools. check-in: e66e52ad3c user: andreask tags: mini-dt
22:09
Scratch - Added narrative tracing to inbound around the IO loop, track exactly where it is and how it moves. - X1, X2 - Demo of the `-1` difference for the include-iii test processing. It seems the issue is rooted in the fact that I stop the engine before the marked character is processed. Trying to set the stop marker after the last character of the stream, to stop after everything has been processed ... does not work. The possible fix would be to change the semantics and stop after the marked character has processed. Then setting the marker on the last character will stop after it, before trying to move into the bad lands. Techically this shifts the impossible stop location to before the first character of the input. OTOH, if we choose to allow an absolute position -1, that might work (shift the controlling test to the beginning of the io loop). The inband `no stop`-signal would then have to be `-2`. check-in: 06e8e23383 user: andreask tags: mini-dt
2018-08-16
22:37
Oh! :facepalm: `inbound_enter` copies just the pointer to the input string into the engine state. This pointer is (as critcl type `pstring`) just a pointer into the Tcl_Obj internals (GetStringFromObj). We are fine as long as we do not perform `inbound_enter_more`, because nothing does anything with the pointer, but reading from it. Not even basic events. But with `inbound_enter_more` we realloc that area. Means that the old space will be freed when the system did not find enough space to expand in place. At which point we pulled the rug out of under the Tcl_Obj* the string came from. Doing anything with that Tcl_Obj* (like releasing it, example `unset var`) will then smash that memory. Fixed, `inbound` now tracks if it owns IN.bytes and acts accordingly when expanding the area in `enter_more`. TODO: Validate stop locations, must not allow anything beyond the end of the allocated memory, system will try to run over into uninitialized cells at the moment. TODO: Alt: Force engine stop on max for total as measure against the above. check-in: 57ccd4001d user: andreask tags: mini-dt
19:58
Completed processing of includes. Completed tests. rtC crash with includes. check-in: e21f8235e9 user: andreask tags: mini-dt
08:07
Completed rtC state reset (all stages!) Fixes the aforementioned weird issue in the rtC/special forms, skips first character. Bogus char index information. Leftover test failures are the incomplete handling of includes. check-in: b889957b9c user: aku tags: mini-dt
07:37
Integrated parser for special forms into the main parser. vset processing done. include processing still to be done. Updated tests. Weird issue in the rtC/special forms, skips first character ? check-in: 79362316fe user: aku tags: mini-dt
07:34
Fix level error in intercept helper check-in: 85d64bf32c user: aku tags: mini-dt
07:33
Extended rtC to properly report errors from event callbacks. Extended both runtimes to reset the system after a parse ends. This allows multiple calls to `process(-file)` for a single parser. Each call is a new run. check-in: 0cf5429d76 user: aku tags: mini-dt
2018-08-15
23:53
Created grammar for the special forms. Testsuite for same. check-in: de9583426c user: andreask tags: mini-dt
19:27
Bring fix into main doctools branch check-in: 35a9eb534a user: andreask tags: language-doctools
17:10
Cherrypicked semvalue processing fix (allow for null) into trunk. check-in: d761e9ed2d user: andreask tags: trunk
07:09
Scratch work on processing the forms. Remove, replace with a small parser to get the internal structure. check-in: 8491fab778 user: aku tags: mini-dt
06:38
Handle possibility of the parser providing a null semvalue. check-in: 0810d9bbd8 user: aku tags: mini-dt
06:37
Fix forwarding in multi-stop check-in: 253f051635 user: aku tags: mini-dt
02:41
mini-dt testing continued. Fixed quote handling in G1 of unquoted tclwords, was missing. Base parser testing passes. check-in: 3d697cb567 user: aku tags: mini-dt
2018-08-14
23:49
Filled out test framework Started adapter class to hold the parse event handling First tests Fixes to multi-stop Issue with on-event and callback setup for Tcl runtime. Oops. Since rt::base this was not initalized. Have to repro on main/event branch for proper test. check-in: efbdda6404 user: andreask tags: mini-dt
20:59
Completed the grammar. Added examples for tests to come. Generated the base parsers. Believe to have a viable idea about processing the special forms allowing us to get by with just L0 post-lexeme events. Main processing would be done by Tcl `subst`, plus a safe interpreter to limit the set of accessible commands. check-in: 49a999848b user: andreask tags: mini-dt
17:48
Name churn in directory organization of the new example. check-in: d12f7b2202 user: andreask tags: mini-dt
07:20
Fill in of the mini-grammar. Lots of the Tcl syntax still needed. check-in: 0fc17efcf2 user: aku tags: mini-dt
2018-08-13
23:57
mini-dt experiment start check-in: 467965138f user: andreask tags: mini-dt
22:49
Draft of adapter to provide management of multiple (stop) markers to any engine. No tests yet. Extended runtimes to remember and deliver the last location of the primary input stream. Moving to that location (`match from`) will force the engine to abort processing. As we have only a byte length at the beginning we needed an additional function for the char index to map byte offsets into char offsets. Some more in-band signaling to defer the conversion of the byte length to char length until the information is actually requested for the first time. This ensures that parsers not doing IO shenanigans will not have to scan the entire input twice, for the last char, and then for actual processing. Only parsers which do will have to expend that time, and only as much as it not yet processed. check-in: 85cce8e95a user: andreask tags: multi-stop
07:07
1. Clean up of stage `inbound`, and preparation for item 2. Removed `eof` method and function from `inbound`, and calls from the overall runtime. `Inbound` itself triggers all the necessary eof processing in the pipeline stages following it, since parse event support was added and its main loop restructured. Reworked the EOF trigger condition in `inbound` to trigger on exactly reaching the end of the primary input stream, instead of anywhere at the end and beyond. This is preparation for the upcoming support for extending the physical input stream with secondary input. Reworked the nested double loop back into a single loop by strategically hoisting the operations done by the outer loop into the relevant conditionals of the inner, making outer superfluous. 2. Added support for adding and use of secondary input streams. Placed in the physical input stream after the primary input it can only be used by explicitly moving the cursor into its span/range. New user-visible methods: `extend`, and `extend-file`. Internal methods and functions (inbound): `enter-more`, `read-more`. 3. Extended and updated test suite First to cover the accumulated public methods accessing and manipulating the current location cursor. Argument syntax, sequencing. One larger movement example demonstrating use of the secondary input (stream extension). Second to cover the new methods for adding secondary input. All tests pass. check-in: 687652d3d6 user: aku tags: parse-events
2018-08-08
22:21
Test fixes. check-in: f301d37ca1 user: aku tags: language-doctools
22:11
Regenerated the doctools parsers, adapt to the runtime changes made for event support. check-in: 983593b1c8 user: aku tags: language-doctools
21:21
Brought the main enhanced with support for IO/L0 parse events into the language example which needs it (doctools: vset, include). check-in: 15eeacc70a user: aku tags: language-doctools
21:17
Pulled current working state of parse events (IO and L0 events) into main. Keeping the branch for future work on the G1 level event support. check-in: 2b796f8c7a user: aku tags: trunk
21:10
Pulled completed heredoc language example into the main parse-event work branch it was based on. check-in: 19009459df user: aku tags: parse-events
21:08
Completed the heredoc example - Tweaked grammar, regenerated parsers - Added narrative tracing to the wrapper completing the event handling. - Completed the testsuite, code and cases. Closed-Leaf check-in: 01cdb8aed0 user: aku tags: lang-heredoc
2018-08-04
03:51
Get fix check-in: 55fc94ae0a user: aku tags: lang-heredoc
03:49
Pull fix into main parse event branch. check-in: 226236cb49 user: aku tags: parse-events
2018-08-03
19:51
Fix event-list problem when using namespaced parser. Closed-Leaf check-in: cf55185633 user: andreask tags: fix-event-list
19:22
Pulled streamlined match API and updated to it. Filled in the helper/demo/example app. check-in: 9c438ff29f user: andreask tags: lang-heredoc
07:19
Continued match API work, fixed dependent files (tests). Removed `rewind`, `values`, `start:`, `length:`, `value:`, `values`, `symbols:`, and `sv:` Renamed `relative` to `from+`. Closed-Leaf check-in: cf1bc408d8 user: aku tags: match-api-streamline
2018-08-02
23:29
Rework the `match` API, streamline, simplify check-in: 081f8f2a56 user: andreask tags: match-api-streamline
2018-08-01
23:34
Started on heredoc demo. check-in: 920296d8ea user: aku tags: lang-heredoc
05:58
Merged the completed work on support for stop events, closed branch. check-in: 10238f62f5 user: aku tags: parse-events
05:10
Full test run, went over failed tests, fixed issues introduced in previous commits. (1) Unbreak engine_debug: progress-reports. Requires recognizer named RECCE. Undone rename to LRECCE, PRECCE in lexer, parser respectively. (2) Fixed bad references in the unconditional support for events. (..._event_list still conditional, name must be C identifier) (3) lexer.test updated to changed constructor signature. (4) Updated results expected by clex/cparse generator tests. (5) Extended parser tests to check the new process(-file) syntax (from, to, limit configuration). Regenerated the builtin SLIF and literal parsers. Closed-Leaf check-in: 8b31181319 user: aku tags: stop-events
2018-07-31
07:19
More rtC/Tcl equalization. check-in: e8fcf4e47f user: aku tags: stop-events
07:06
Trimmed whitespace. check-in: 2fa1c65bbd user: aku tags: stop-events
07:05
rtc and rt-Tcl nearly matching again as per the tests, however the tests have known holes. TODO: Extend test coverage (parser/lexer constructor options). Generated parsers/lexers now unconditionally support events. TODO: A number of current separate structures can be consolidated. TODO: Name churn, and rename `sem_tcl`. check-in: 8b36e7d0bb user: aku tags: stop-events
2018-07-27
23:30
Scratch commit. Compiles. Tests fail. Begun implementation of initial from/to/limit for `process(-file)`. Required to test stop events (force event from outside). Also a feature of the original. Fixed forgotten use of `location?` in the inbound tests. Need more inbound tests for all the new methods for cursor movement. Bug: The ability to force a stop event on the engine from the outside implies that we have to support events even for grammars which do not declare any. The templates for rtC-based parser/lexers must change accordingly. We might be able to optimize them down to `support only stop`. check-in: 3d17118a55 user: andreask tags: stop-events
21:43
Tweaked copyright info. Churn. check-in: 7ebed1837c user: andreask tags: stop-events
20:56
Added more narrative tracing. WTH ? This fixed rtc `sv:` ?! This needs more investigation. check-in: 6383f3328f user: andreask tags: stop-events
20:15
Replaced posint and posint0 definitions with aliases to new critcl argument types. (See critcl tag `scalar-basic-limits`). check-in: 0a54dfb149 user: andreask tags: stop-events
07:10
Implemented facade access based on event type. Updated tests. Name churn for the match methods match moveby -> match relative match moveto -> match from match stop-at -> match to match stop? -> match stop New method `match dont-stop` to cancel a stop marker/event. Note: rt-Tcl and rtc divergent wrt `sv:`. check-in: 32e9e5d575 user: aku tags: stop-events
2018-07-26
23:46
Fixed the parse issues. Draft snapshots for all match tests. Implement the per-event access control next. check-in: 460323c2ae user: andreask tags: stop-events
22:53
Continued work on the facade tests. Syntax and coarse check ok. Per-event check still has troubles. All rtc. Triggered the assert for the move bug reported with commit [2450c40017]. Fixed the issue, reorganized `inbound` and `clindex` for this. check-in: 0ade3d5008 user: andreask tags: stop-events
07:55
Continued work on stop events and facade validation. New `stop?` query method. Still need a method to clear stop marker. Basic acess rejection for rtc (error when outside event handler). check-in: defc583e62 user: aku tags: stop-events
2018-07-25
23:58
Started on match tests. Core operation. check-in: 9625d9a9bb user: andreask tags: stop-events
19:36
Assert a bug to fix check-in: 2450c40017 user: andreask tags: stop-events
19:36
Fix warning check-in: 5fc1c7b8e6 user: andreask tags: stop-events
19:36
Added in some argument validation. check-in: bb08f525de user: andreask tags: stop-events
18:49
Copyright updates. Added facade docs missing from previous commit. check-in: e26673daac user: andreask tags: stop-events
18:41
Continued work on stop events. Completing rt-Tcl, implementing rtc. Compiles, however not tested. Do not have any tests for stop yet. No tests available to check facade access control either. Draft facade argument types and access control now documented. check-in: 892e53a219 user: andreask tags: stop-events
15:31
Started implementation of stop events, and match state access validation. check-in: 1adbab056d user: aku tags: stop-events
04:02
Tweaked the match API, specifically location reporting and manipulation. The user visible current location always points to the character where (with which) processing will resume when the event handler returns. Updated the tests. check-in: 11a3929e08 user: aku tags: parse-events
2018-07-24
03:54
Merged the work on PE rtc support into the general PE branch. Nearly at the point where we can do heredocs. For that we still need a stack of input segments for more control over movement in the string. check-in: 8ff5c985cb user: aku tags: parse-events
03:49
Updated testcases for the changed generator core. Closed-Leaf check-in: 6c51252fb9 user: aku tags: parse-events-rtc
03:25
Extended rtc with implementation of the match method `alternate`. rtc lexer parse events complete. Lexer parse events complete (support in both runtimes). TODO: Create example (heredocs) demonstrating use. check-in: 265676058d user: aku tags: parse-events-rtc
03:22
Extended narrative tracing. check-in: de55050b65 user: aku tags: parse-events-rtc
2018-07-23
21:05
Extended event testing to match method `alternate`. check-in: b2a6a0ad23 user: aku tags: parse-events-rtc
19:54
Extended rtc with implementation of the sv/symbols match methods (set and get). Extended the generator to provide the necessary structures for the conversion of symbols to internal ids. Added a number of comments around the inspection of the rule bytecode for error information. check-in: 193d2ab03f user: aku tags: parse-events-rtc
19:40
Extended event testing to the sv/symbols match methods (set & get). check-in: bc770d489f user: aku tags: parse-events-rtc
2018-07-19
23:47
Name churn check-in: a2f348ee40 user: aku tags: parse-events-rtc
20:28
Fix trailing whitespace in generated files, due to the trimming of the templates. check-in: e5e54951cc user: aku tags: parse-events-rtc
19:40
Overdue trimming of trailing whitespace across the board. check-in: ee9b3c40b5 user: aku tags: parse-events-rtc
19:39
Ignore a few more work things in the src area. check-in: dea206a3fa user: aku tags: parse-events-rtc
07:08
Created input indexing per doc.1/INDEX_3.md. Integrated indexing with inbound. Demo tests now pass. Still to do: Passing in alternate symbols and sem values. check-in: 3d4c0b8e3a user: aku tags: parse-events-rtc
2018-07-18
20:49
Demonstrate location mis-tracking by rtC when moving in input with multi-byte characters from a parse event handler (non-ascii-b). Show general ok tracking without moves by the user (non-ascii-a). check-in: 0f203e8f47 user: aku tags: parse-events-rtc
18:59
Continued fill-in for before/after events. `Inbound`, `gate`, and `lexer` rewritten to match rt-Tcl behaviour with regard to stepping through the input, rewind, and flush. __Attention__ `Inbound` location tracking handles only ASCII correctly, still, not multi-byte UTF. Still mulling over possible index structures to enable quick movement in the presence of multi-bytes with low memory overhead for very uniform input. Changed the `lexer` field `m_sv` to an int stack, the component tracks ids, with the associated SVs already remembered in the `store`. Forced changes to the implementation of the facade, and to the API between `lexer` and `parser` (changed signature and implementation of `marpatcl_rtc_parser_enter`). Fixed forgotten difference between discard and other events (ACS symbols vs G1 symbols) in the C generator code and the id/symbol conversion done by the facade. check-in: dc070a04f8 user: aku tags: parse-events-rtc
05:05
Fix comment typo check-in: 2a46239e90 user: aku tags: parse-events-rtc
2018-07-17
16:51
Started on before/after events. Will not compile, see all the XXX markers, just a checkpoint. check-in: 91ad806001 user: aku tags: parse-events-rtc
05:29
Filled in most of the facade. Still missing parts relevant to before/after events. Location accessors implemented, only correct for ASCII. UTF support is still to do. Moved the API functions into inbound and lexer, with pieces in the pedesc class. Dropped the separate pedesc header and C sources. Optimized moveto, add the delta to the positions before delivery to the engine. Lexer now manages all the new fields for the match state (initialization, update, reset). lexeme data for SV now coming out of the new fields and accessors. Event testing now recording any error we may receive from the match facade. check-in: dd2dfa9d07 user: aku tags: parse-events-rtc
2018-07-13
23:47
Attention: This commit will likely not even compile. It is saved scratch state of work on the rtc lexer match state to complete the parse event facade. Reworking the lexer state internals for cached access to the information, ability to modify. This will affect the lexer/parser interface, namely the transmission of found symbols and semantic values. check-in: 96dc913d9e user: andreask tags: parse-events-rtc
22:27
Added tracing of the new event matching and reporting code. Tweaked tracing of the EH functions (separate stream). Added detection and reporting of discard events. Fixed generated lexer event entries, engine operates with the ACS symbols. Updated tests. Discard events are detected and reported. Testsuite fails because the PE descriptor facade is not completely filled out yet, and the incomplete methods have divergent signatures from the expected. While the Tcl errors thrown by the event recorder callback used in the testsuite are ignored during execution, they are properly seen in the narrative tracing proving that the discard callbacks works. Fill out structure and facade before implementing the before/after callbacks. check-in: 91aa29f8f2 user: andreask tags: parse-events-rtc
18:06
Linked the PE descriptor facade into the lexer/parser templates. Removed scratch notes from the facade. Updated tests. check-in: c20d03a3ef user: andreask tags: parse-events-rtc
2018-07-12
21:35
Extended the facade with set/get for the class rtc variable needed by the constructor check-in: 9e2ca3d617 user: andreask tags: parse-events-rtc
20:58
Started work on generic parse event descriptor access. Using a critcl::class as facade to the structures, ensemblified methods. Requires critcl::class 1.1 to disable tcl-api, generate c-api. check-in: 84fa7104f2 user: andreask tags: parse-events-rtc
20:01
Fixed typo in comment check-in: 915f151c40 user: andreask tags: parse-events-rtc
05:43
Fixed [f5e6063aeb] memory smash. Miscounted references to the `self` argument of parse event callbacks. Updated tests with all the generator changes. rtc-runtime tests: L0 parse event cases failing as expected, as event generation is not done yet. However we are now at the point where we can start on adding this in. Most of the foundations are now present. Notably still missing are the parse event descriptor structures and its linkage to the lexer state. check-in: 66339c7f20 user: aku tags: parse-events-rtc
2018-07-11
22:52
Filled in the marpatcl_rtc_eh_... functions and structures. Fixed bogus declaration syntax for the generated event structures. __Attention__: Seg.fault in core. Likely caused by the new functions, structures, and their use. Update: Fixed with [66339c7f20]. check-in: f5e6063aeb user: andreask tags: parse-events-rtc
20:28
Follow up to lex-only refactoring, updated tests. check-in: 64a75ee092 user: andreask tags: parse-events-rtc
19:58
Ripped the general structures and code for lex-only token/value handling out of the clex template and placed them into the marpa::runtime::c package for sharing. As part of that the result and event callbacks from RTC now have separate client data information. Note, the yet undefined `eh` structures and functions already referenced by the event handling code are a similar thing for that aspect. check-in: fce9c19274 user: andreask tags: parse-events-rtc
19:47
Always initialize variable check-in: f3f39059c2 user: andreask tags: parse-events-rtc
19:46
Added notes about uninitialized memory to the two set implementations. check-in: 8202ba0482 user: andreask tags: parse-events-rtc
19:43
Memory smash fixed. SV ref miscount in the lexer in lex-only mode when one SV is re-used for multiple tokens in the same location. Win for the (SEM_)REF_DEBUG functionality coming out of the mem-limit, memory-cleanup branches. Valgrind was no help. check-in: 1d1b5cb7d8 user: andreask tags: parse-events-rtc
04:36
Fixed missing cleanup of callback field, caused a bogus 2nd destroy on object destruction. Tweaked comment in test support code. check-in: afeb1c4ea6 user: aku tags: parse-events-rtc
00:07
Filling in the C-level data structures and API changes implied by the template changes for clex and cparse generators. ATTENTION: Manually modified the builtin parsers (slif, literals) to match the changed structures and function signatures. ATTENTION: Memory smash somewhere in the zeta-rtc-lexer tests. (Fixed with commit [1d1b5cb7d8]) check-in: 3df1d4cea4 user: andreask tags: parse-events-rtc
2018-07-10
23:13
Show i-gen critcl command in the log. Added main tclsh include to the set of include paths to search. check-in: e274d2eda6 user: aku tags: parse-events-rtc
22:21
Converted the eof/enter callbacks of clex to critcl::callback. check-in: c230bc5be1 user: andreask tags: parse-events-rtc
21:13
Extended asset management to allow more than one asset. Generate C event datastructures, lex & parse. Updated and extended tests. check-in: c7e33a1f26 user: andreask tags: parse-events-rtc
2018-06-26
18:20
Make rewind more robust. Tweak post-event input movement. Added event scratch docs check-in: 22e19935f3 user: andreask tags: parse-events
2018-04-20
21:28
inbound: Absolute movement, tweaked for consistency. Extended with optional delta arguments. gate: Extended with forwards to the input location accessor and modifiers. This provides parse event handlers with the ability to move in the input. lexer: Moved redo to before we handle parse events. This enables parse event handlers to modify the location without interference from the system itself. Modified pre-lexeme event generation to move input to the start of the lexeme. Fixed pe-fill method, forgot to set flag for the incremental rebuild of the symbol/sv tables. parse descriptor: See gate. Further fixed accessor setup. Extended view to report input location. testsuite: Report input location. Move input location to the end of the lexeme. Needed now that pre-lexeme events have the location at the start. Updated expected results. check-in: faebf8fba4 user: aku tags: parse-events
04:33
Pulled memory fixes into the feature branch implementing parse events. check-in: 07ca2e2be0 user: aku tags: parse-events
2018-04-19
19:47
Update main line with the accumulated memory fixes. check-in: dcf340cd32 user: aku tags: trunk
19:44
Merged series of fixes for memory issues (memory leaks, management mismatches, ...). Together with the push of memory-intensive tests into child processes done here the testsuite should now be properly constrained again wrt memory usage. Updated tests. Closed-Leaf check-in: dc34e1afbb user: aku tags: mem-limit
18:18
Added test against unbound parser memory usage in parser-core. Found and fixed SV ref-count mismanagement (RCM) in the lex-only code path of the RTC. Added narrative tracing to the code path. Cleanup of lexer-core testsuite with regard to memory debugging. Found and fixed Tcl_Obj* RCM in the lex-only critcl template. Added narrative tracing to the template. Closed-Leaf check-in: a29c4613b0 user: aku tags: memory-cleanup
2018-04-18
23:41
Do not mix malloc and Tcl allocation routines. Even when the malloc is hidden, here in `strdup`. Used to allocate strings with strdup, then release with ckfree/Tcl_Free. This messed up memory mgmt internals to the point of memory smashes. Fixed, now using our own implementation of strdup based on the allocation macros from environment.h. This ensures that alloc and free match. Thank you, valgrind. check-in: 714d438603 user: aku tags: memory-cleanup
03:08
Moved the main parts of `test/support/gen.tcl` into `bin/i-gen`. This new internal generator application uses the public `marpa-gen` as the underlying workhorse. The support code now invokes the internal generator instead of doing everything itself. With the operation confined to a child process the amount of memory taken by the test process' is limited to the generated package itself instead of all the packages needed to perform the generation. Further modified lexer-core to place the sets of tests for each variant grammar/lexer-action into a child process as well. This ensures that the memory needed by each variant grammar is limited to that child process, instead of accumulating in the controlling test process. __Attention__: This change requires a Kettle with support for `kt::sub`, added to Kettle with commit [ef384673c5] (2018-04-18 02:28:17). check-in: ddc1e67640 user: aku tags: mem-limit
2018-04-17
20:36
Moved the main parts of `test/support/gen.tcl` into `bin/i-gen`. This new internal generator application uses the public `marpa-gen` as the underlying workhorse. The support code now invokes the internal generator instead of doing everything itself. With the operation confined to a child process the test process' amount of memory taken is limited to the generated package instead of also keeping all the overhead of generating Note, this does not help with the test suites based on lexer-core as that suite still loads/sources ten different lexer packages into the same process. Handling this requires more work, i.e. pushing the individual tests into their own child processes. check-in: ad4f1d4287 user: aku tags: mem-limit
2018-04-11
23:33
Extended the narrative tracing in `sem_tcl.c` to track refcounts. Found and fixed the cause for the orphaned Tcl_Obj*'s. The function `marpatcl_rtc_sv_astcl` did an extraneous refcount increment on the conversion result (SV tree to Tcl_Obj* tree). Parser operation based on RTC now does not leak anything anymore. check-in: 302f47227a user: aku tags: memory-cleanup
07:43
Added code (sem_debug.c) to track SV allocation and release, and dump orphan SVs at the end. All SVs were orphaned, nothing released. Tracked to a bad guard condition in function `marpatcl_rtc_sv_unref`. As the code checks the refcount before decrementing it both 0 and 1 must trigger destruction, not just 0. Fixed. All SV structures are now properly released at the end (parser instance destruction). Still orphaned things left, these however are Tcl_Obj's. check-in: 0d6e6c1dbe user: aku tags: memory-cleanup
2018-04-10
04:55
container: Fixed leakage of priority and priority alternate objects. Test support extended with various diagnostic tools. Note, it looks like the RTC and glue into Tcl are also leaking like mad, especially around the semantic values and the (partial) ASTs we are constructing. Continue investigation and fixing. Started a new branch for this, check-in: c3ff9015b6 user: aku tags: memory-cleanup
2018-04-06
06:20
inbound, gate, lexer: Added a back-link from gate to inbound, analogous to the gate/lexer and lexer/parser connections. Dropped history management from the gate and changed its redo method to simply rewind the input instead of re-entering the tail end of the history. This is the first use of the new cursor movement methods added to inbound. Under the old regime using foreach, and steadily marching forward in the input all re-processing was done by remembering and recursively re-entering characters as needed, with additional loops at the various stages of the pipeline (`gate`). With the while-based cursor on the other hand we have only one (nested) processing loop (`inbound`), and all movement is handled by it. We cannot recurse, and we can't have a loop in `gate`. The nested loop in `inbound` is required to handle the case where we reach eof and the later stages then tell us `not yet`, i.e. bounce us back. The inner loop is the main processor, and the outer loop restarts it until eof actually succeeds. In `gate` the flush-signalling changed. In 'lexer' eof signalling to the `parser` is prevented when it bounces the input away from eof. check-in: 065653b213 user: aku tags: parse-events
04:25
Pull gate readability changes into parse-event feature support. check-in: cc42a78587 user: aku tags: parse-events
2018-04-05
08:29
runtime-tcl, gate: Moved a few code blocks into their own methods to make their now-caller more readable. check-in: 0dff9cc32b user: aku tags: trunk
2018-04-04
17:57
inbound: Converted the `foreach`-loop processing the physical input stream into a `while`. The location information now is a cursor into the input, instead of a dependent variable. This allows us to move backwards in the input, or ahead, as we see fit. Added movement methods (absolute, relative) for the cursor. No stream expansion yet, nor virtual streams. check-in: 9c1fa89ac4 user: aku tags: parse-events
16:58
Added generation of pre- and post-lexeme events. Extended the testsuite to demonstrate them. Note, the pre-lexeme trigger location does not match Marpa::R2 yet. check-in: 7693f8601c user: aku tags: parse-events
06:44
Reworked the lexer somewhat. Moved match state into a nested object for easier access. Exposed to parse event handlers via a limiting facade. Outside entrypoint is parser method 'match', an ensemble. Moved to single event handler call bundling all relevant events. Internally also used to simplify GetSemanticValue. Updated discard event test. check-in: 97d4a5397c user: aku tags: parse-events
2018-04-03
20:42
Pull recent fixes into the language work check-in: da745b6e55 user: aku tags: parse-events
20:37
Fixed bug in the semantics' handling of :lexeme. Do not exclude the symbol from LATM fixup if the :lexeme adverbs do _not specify_ latm information. Facepalm. Found playing with lexeme events, and GC missing the crucial latm information, breaking generated test parsers. Updated tests to correct results. check-in: e5e442db2a user: aku tags: trunk
17:47
Tests: Remove a leftover break from debugging something, wrongly committed. check-in: 7c7353ad24 user: andreask tags: trunk
04:42
Get latest docs to work with. check-in: 1c73889373 user: aku tags: parse-events
04:41
Extended the Tcl runtime with basic support for events via callback. Set/unset/query, forwarding from the inner objects. Definition and storage of event maps. Proper pre-processing of such maps for the lexer, not yet for the parser. Added generation of discard events. Extended the testsuite to demonstrate the basic infrastructure, and discard events. Removal of trailing spaces. check-in: 0f5c6931e2 user: aku tags: parse-events
2018-03-30
23:42
Docs: More small fixes Leaf check-in: d89aeafc1a user: aku tags: docs
23:38
Get doc fixes check-in: d41168399d user: aku tags: trunk
23:37
Docs: Typo fixes, phrasing fixes. check-in: 2a1acc9132 user: aku tags: docs
23:24
Make recent documentation work official. check-in: 607761c40d user: aku tags: trunk
23:23
Docs work - Moved architecture from intro to dev guide. - Made intro a dispatcher to other documents based on readers goals. - Added reference for marpagen. - Added placeholder for SLIF, referencing the upstream Marpa::R2 documentation. check-in: 9f10a242a4 user: aku tags: docs
2018-03-29
23:14
Docs: Completed addition of feedback sections. Added audience/target information sections. check-in: fdf9132f0d user: aku tags: docs
22:36
Docs: Added changes, license documents, libmarpa requisite for installer, factored welcome message, added feedback in parts check-in: 2db7ae6cd3 user: aku tags: docs
21:46
Updated docs work with trunk changes. check-in: 9ad599cfa5 user: aku tags: docs
05:34
Started implementation of parse events. rt-Tcl first. Implemented generation from container, with fixes to have access to the stored G1 events. Extended testsuite to show event information, and updated older tests. Some whitespace corrections (removal of trailing spaces). Some tracing tag fixes. check-in: f99351071e user: aku tags: parse-events
2018-03-27
19:58
Grammar edits: - Tweaked some of the formatting. - Main change: Redone the spine of the document structure. Moved the nullability around, enabling use of sequence rules. The price is paid by the paragraphs, having their separators in the AST (Cannot mask/hide the separator of a sequence from the AST). Explicit recursion for multiple separators in sequence however allows hiding that in a single separator. Still, AST nesting significantly reduced. Further doctoring is something for full custom command actions, or the semantics taking the AST. Regenerated parsers. Updated test suite to match. check-in: 1a2a8af59c user: aku tags: language-doctools
07:17
Grammar: - Edited to provide the keywords with proper lexeme symbols instead of the ugly automatic names. - Reworked command definitions to enforce space after a command lead-in. - Added lots of custom actions (::first), to simplify the returned AST. Regenerated parsers. Went over test suite again. Removed the fail cases from Tcllib, completely bogus in places against the more strict specification. Reworked the ok cases and added the first proper ok results. Still thinking about the main spine of man pages and how to express nicely. The current explicit recursive structure nests a bit deep. check-in: de0a504dfd user: aku tags: language-doctools
2018-03-26
21:35
Created parsers from draft. Made test suite functional. First runs, all results bad (different error messages on failure, different type of AST structure). check-in: d46d0a963a user: aku tags: language-doctools
21:08
Documentation, installation guide: Added instructions for setting up `libmarpa`. check-in: 254c53ee1f user: andreask tags: trunk
2018-03-25
05:15
Added untested draft grammar. check-in: ecdb604ac6 user: aku tags: language-doctools
02:53
Added test-vectors used by Tcllib. check-in: 7e961f36fa user: aku tags: language-doctools
2018-03-24
23:33
Pull alias support into the example. check-in: 26bb8496d5 user: aku tags: language-doctools
23:31
Activated alias support in main. check-in: d60e61a1ce user: aku tags: trunk
23:30
marpa::unicode - Added alias handling. Updated testsuite. Further: - Reworked the table generator, more separate phases, less intertwined operations. Split across several files now, with each a set of related commands to manage part of the data structures. - BMP/SMP are directly generated as aliases where possible - Fixed issue with long-form category names for aliases. Tcl has its own definition of `control` (cc+cf+co). - Updated boot parsers. Closed-Leaf check-in: 85df278cb8 user: aku tags: cc-aliases
04:23
marpa::unicode - Added alias handling. Updated testsuite. Further: - Tweaked generator output. - Fixed issues with missing :bmp/:smp information for some aliases. check-in: cb06be0e04 user: aku tags: cc-aliases
2018-03-23
23:59
Table generator: Convert `:bmp` and `:smp` CCs identical to their origin into aliases, reducing storage requirements. Output not usable anymore until the unicode layers gets extended to recognize and handle aliases. check-in: cc6426a1cc user: andreask tags: cc-aliases
23:47
Introduced char class aliases into the output of the table generator. check-in: 7771b3d5ad user: andreask tags: cc-aliases
05:31
Start on 2nd big example, doctools, of Tcllib check-in: ccb57223f3 user: aku tags: language-doctools
2018-03-22
03:48
Language example `JSON` is now official. check-in: c4414a4012 user: aku tags: trunk
03:46
Phase 1 documentation now official. check-in: 17146d5f3e user: aku tags: trunk
03:46
Intro and dev guide completed. Doc phase I ok. check-in: 0698f503c3 user: aku tags: docs
2018-03-21
22:12
Started on proper documentation. Basic guides, some placeholders, no package docs yet. check-in: 45c56e540d user: andreask tags: docs
06:55
The known bug was due to a subtle difference in the two parsers. The Tcl-based parser accepted standalone surrogates, the C-based one did not. This came down to a guard condition in the ASBR compiler, which excluded any surrogates found in the input CC from the result. A design bug, not an implementation bug. Removing the guard fixes the issues with the json parser. The first attempt at the fix, adding the standalone surrogates explicitly to the grammar run into the same guard, albeit in a different manner. The explicit range became an empty literal during reduction, got removed, and then the still-existing reference to it caused the generator backend to throw an error. Regenerated the C-based parser, and updated all test results to match the changes in the rule numeration. Closed-Leaf check-in: a72fb8f4d2 user: aku tags: language-json
06:54
Merged design fix to the json experiment. check-in: cc8068dba7 user: aku tags: language-json
06:53
Fixed a design issue. Surrogate handling. The low-level unicode layer used a guard to prevent the addition of the surrogate codepoints when creating an ASBR from a CC. Thus for any CC including one or more surrogates the resulting ASBR actually represented the CC minus surrogates. The thinking was sort of that we are working with characters at the high-level, and while we have an 1:1 mapping to codepoints for most, this is not true for the surrogates, each of which is only half of a character. And the reducer targeting the C runtime based on Tcl ensures that characters in SMP are properly converted into surrogate pairs. Working on the JSON parser now has driven home that there are situations where we want to accept standalone surrogates, at the highlevel, and also that the low-level removal was a bad idea as well. The latter because a negated char class handled by Tcl does accept the standalone surrogate code points, whereas the ASBR for C is mangled to reject them. The result is a very unwanted difference in the behaviour of what should be equivalent parsers. So, lots of writing for a very small change, code-wise, the removal of the guard mentioned in the 2nd paragraph. And undoing the removal of the surrogates as a named character class. Further, brought the bugfix from commit [546018b243] into the `unicode_ops.tcl` used by the table generator. Same issue, and forgotten when the initial fix was made and committed. Updated tests to match results due to rule renumeration and CC changes. check-in: 1695c17f13 user: aku tags: trunk
2018-03-20
19:13
Continued work on the json test suite. Processed all the i_* cases. A single known bug to investigate for rtC. check-in: 84467cbcf6 user: aku tags: language-json
05:57
The unicode work is good enough to solve the known issues with the json parser. Make it officially available to trunk. check-in: 81404d4d77 user: aku tags: trunk
05:55
Unicode work good enough to solve the known issues with the json parser. Integrated. check-in: b333341f7c user: aku tags: language-json
05:50
Known bugs all settled. Test results updated for the changes in the rtC counting (characters, not just bytes). Closed-Leaf check-in: 3bf1d3e8b3 user: aku tags: json-unify
05:47
Update the json/unicode mix branch with the latest fixes on unicode. check-in: 2c4895d98b user: aku tags: json-unify
05:46
Update unicode work with the latest fixes on trunk. Closed-Leaf check-in: 78ab6a1f68 user: aku tags: reunification
05:45
rtC's mixed use of both `char` and `unsigned char` in various APIs and code, plus interaction with `int` breaks lexing when attempting to go beyond ASCII, even when restricted to the BMP, due to bytes > 127 showing up as negative. Fixed by changing all uses of `char` to `unsigned char`. Further changed extraction of semantic values. - Lexeme length is now counted in characters, not bytes. - Similarly, lexeme end is now characters from start. - Input is now byte- and character counted, for proper lexeme start. Character counting in C strings pulled from tclUtf.c Error messages now use the new char offsets, plus byte offsets for partially read characters. check-in: b0d7fa6f75 user: aku tags: trunk
05:42
Some cleanup in the core testsuite. check-in: 65b182f33e user: aku tags: json-unify
05:40
Added textual decodings of the knownBug y_* inputs, for readability. check-in: ca443ec660 user: aku tags: json-unify
2018-03-16
23:44
Specials done, make them available to the main line. check-in: 754111fe8e user: aku tags: trunk
23:43
Implemented special semantic action `::first`. Added support in the tcl and rtc runtimes, and the generators for these runtimes. Plus test. Closed-Leaf check-in: ef22cbb99b user: aku tags: specials
23:24
Updated work on specials with bugfixes for issues found with it. check-in: 8918fc0839 user: aku tags: specials
23:22
Added test triggering rtC code path where a new lexer starts out with the parser exhausted, i.e. nothing acceptable. Fixed missing closing of the earleme for that case, and missing handling of the `lexer exhausted` error from libmarpa. The missing closing operation also caused miscommunication between lexer and gate, ultimately crashing the latter with a symbol id outside of the byte range. The gate change is only a tweak to get better tracing, i.e. print the acceptables before working with them, not after. check-in: 64775966bd user: aku tags: trunk
06:13
Added test triggering the code path for the generation of semantic actions for grammars which have multiple actions across their G1 symbols. Fixed variable name typo in that code path. check-in: 2f455e21d2 user: aku tags: trunk
2018-03-15
07:04
Added description of the various files found in the test grammars, and their relationships. check-in: eaefc0b630 user: aku tags: specials
2018-03-14
23:00
Implemented special semantic action `::array`. This one was trivial, maps to array action `values`. Mapping is done in the semantics, the backends do not see the special. Started on special semantic action `::first`. Tests, no implementation yet. check-in: 7626d0194b user: aku tags: specials
2018-03-13
17:38
Updated the json tests. The only tests directly affected by the merge where a number of expected-negatives where the error message did not match anymore, due to changed symbol ids coming out of the generator. Furthermore, the `process-file` changes reduced the set of known bugs, eliminating all expected-negatives from it (process-file sees the same data as process now,, making the error messages the same for the two methods). We still have issues with various expected-positive which still error. These are suspected to require changes to the json grammar. check-in: d4f0f54320 user: aku tags: json-unify
17:30
Bring the unicode work into the json experiment for eval. check-in: 3358e68801 user: aku tags: json-unify
15:01
Fix the `process-file` method of the rtC backends. `Tcl_Read` does not do encoding processing. Replaced with `Tcl_ReadChars` which does. Plus attendant changes to handle the different signature. Now `process-file` is equivalent to `process` in that it sees CESU-8, and MUTF-8. check-in: e43eff4a15 user: aku tags: reunification
2018-03-12
21:06
Updated the rtC backends (lexer, parser) to use `utf-8` as the encoding for `process-file`, and updated all places affected by this (test results, bootstrap and literal parser). check-in: 59665cfdcf user: aku tags: reunification
20:33
Updated the literal parser. The updated bootstrap parser was already pushed in the previous commit. Passes the entire testsuite. We are now mostly (*) ready to go back to branch `languages-json` and evaluate if the extended unicode support helps the json parser. (*) Have to fix the `process-file` method in RTC (encoding = utf-8` = Tcl's internal encoding, like for the `process` method). check-in: edd9f06929 user: aku tags: reunification
19:59
literal::parse - Add code to handle compat `try`. Oops. Updated the bootstrap parser as well, already. check-in: 8fc12e03e6 user: aku tags: reunification
19:58
runtime::tcl - Move code for compat `try` into main package entry check-in: d503ddfb50 user: aku tags: reunification
17:11
Merged completed literal work into the general unicode work. check-in: 47e7cce082 user: aku tags: reunification
16:34
Tweaked the API between reducer core and callbacks, to handle symbol creation for the custom tags. finalized the integration of the new reducers into the generator backends. Extended Tcl backend to refactor codepoint ranges used in ASSRs, like the rtC backend does for byte ranges. Updated tests. Mostly. See note below. Note: Boot parser has not been switched to the extended grammar yet (allowing for unicode references in the SMP). Closed-Leaf check-in: 7d263e069d user: aku tags: relit
2018-03-10
07:44
Brought case-expansion perf work into literal work. Tests for literal norm and reduce pass. Time spent on them now a minute, versus originally 45 minutes. Running tests often is useful again, instead of a pain. check-in: b5a818af28 user: aku tags: relit
07:39
Moved up in the stack, to `unicode::unfold`, and `unicode::fold/c`. These are the charclass and string ops based on `data::fold` and `data::fold/c`. Added the workhorse function `marpatcl_scr_unfold` to `c/scr.[ch]`. Tests pass, and time to run them went down to a second. Target reached. Closed-Leaf check-in: 5321722a73 user: aku tags: refold
05:13
Reworked the table generator to put the case-folding information into simple C-level arrays of integers, plus an accessor function. Replaced the low-level `data::fold/c` and `data::fold` procedures with C implementations. Extended the tests a bit and updated them to match minor changes (error message, and the fact that the generator has the latest tweaks from the checkout of `relit`). The tests now take about 15 seconds. check-in: 696e160cbe user: aku tags: refold
05:07
Added tests demonstrating the slow speed of case-expansion operations and related, first seen on branch `relit` with the much expanded set of literals to cover normalization and reduction. Each of the three .test files takes about fifteen minutes, with the majority taken by case-expansion of large character classes. The tests here take about 29 seconds. This is the beginning of branch `refold` to move the case-folding information and the operations on it into the C level. check-in: 3ceee77a53 user: aku tags: refold
2018-03-06
22:28
Continue previous commit ... check-in: 01ba5c2fe1 user: aku tags: relit
22:27
Rewrote the reductor for C engines linked to Tcl (mutf-8, cesu-8) based on the new framework. Redone testing, now using the same cases as for normalization. Extended the test cases to cover the custom tags as well. check-in: 6c97ea1a57 user: aku tags: relit
2018-03-05
20:22
Rewrote the reductor for Tcl engines based on the new framework. Redone testing, now using the same cases as for normalization. check-in: 9a39768f89 user: aku tags: relit
20:20
Moved highlevel methods for reduction of grammars into the reductor state class. See previous for the tests. check-in: 5146df36f3 user: aku tags: relit
20:18
Updated reductor state tests somewhat. Noted the parts not covered yet. check-in: dec03b44b2 user: aku tags: relit
20:17
Reworked testing of normalization using the new way of specivying cases and results. Systematic set of cases covering a large set of possible input. Fixed issues in normalization uncovered by the new set of cases. Modified normalization to pass literal using unknown/custom type tags as is. check-in: 845a016aa9 user: aku tags: relit
20:14
Added another way of specifying large amounts of test cases and related results (files, one line per case/result, except comments and empty lines) check-in: 0a1ccc69b3 user: aku tags: relit
20:11
Extended `negate-class` to allow limiting negation to SMP. No tests yet. check-in: fb0f7497c5 user: aku tags: relit
20:10
Remove leftover unused procs. Extended `data::cc` to handle `%foo` as case-unfolded form of `foo`. No tests for this yet. check-in: 30928789a2 user: aku tags: relit
20:09
Make the area limits available to scripts. Tests added. check-in: 4c80a213e5 user: aku tags: relit
20:07
Standardize on `SMP` as the shorthand for all codepoints beyond the SMP. Updated generator. check-in: 9354659bfb user: aku tags: relit
2018-03-02
06:33
Merged bugfix from unicode redo into literal redo check-in: e549aa5b94 user: aku tags: relit
06:30
Merged bugfix from trunk into the unicode redo. check-in: d437c9ded0 user: aku tags: reunification
06:23
unicode. Fixed bad condition for handling the final element in negate-class, after the main loop. Triggered by the next-to-final element ending just before the UNI_MAX. Test added, demonstrating bug and fix. check-in: 546018b243 user: aku tags: trunk
2018-02-28
19:52
Merged accumulated unicode rework to literal rework. check-in: d0a032f526 user: aku tags: relit
19:51
Tool tweak. For each named class FOO generate additional named classes `FOO:bmp` and `FOO:high` which are limited to the parts of FOO inside and outside the BMP. Note: Empty classes are not generated at all. Example: `adlam:bmp` does not exist because it would be empty. In turn `adlam::high` == `adlam`. Note 2: The `:` in the names of the new classes prevents use from SLIF. This may change in the future, as it might be sensible to provide languages access to the limited classes. check-in: 975b4f5f90 user: aku tags: reunification
2018-02-27
23:47
SCR datastructure tweak. While we keep allocating the ranges as part of the main struct the definitions now allow for separate allocation (looking ahead to char classes stored in constant/static C structures). check-in: 9a2f1aa29a user: aku tags: reunification
23:32
Oops. Fixed tracing support broken by [bd70f9d2de] (intro of separate .c/.h files). check-in: b19a1b0d5f user: aku tags: reunification
21:48
Implemented ASSR, an ASBR equivalent for char classes based on surrogate pairs (for codepoints outside the BMP). Added tests. check-in: be0e4cf1a2 user: aku tags: reunification
17:41
Added `2char` to the low-level unicode support. Needed for the reworking of literal handling, see branch `relit`. Todo: `2ascr`, i.e. ASBR equivalent for char classes based on surrogate pairs. Todo: Intersections of named char classes with BMP and outside. Todo: Alias mechanism to save on storage space. check-in: 6c41cde6ea user: aku tags: reunification
2018-02-26
23:34
Moved main loop for reduction into the rstate class, with changes. The new main loop's API to reduction is a callback instead of a set of rule names. Redid the DSL commands of `reducer` as methods of `rstate`. Moved old public API to internal (renamed methods). Updated existing tests for the now internal methods. Added tests for the new public API. TODO: Reimplement the reducer rules as proper callbacks for the new state class. check-in: 35be2317de user: aku tags: relit
2018-02-23
06:58
Reworked internals of the normalizer. Split the big switch into a series of nicely contained procedures, each handling the type they are named for. Dispatch is dynamic on the type tag of the literal. Split the remainder (reducer, rstate, parser) out into their own packages. Updated users. Now ready to look into alternate implementations. check-in: f29be57f3c user: aku tags: relit
05:23
Split the remainder (reducer, rstate, parser) out into their own packages. Updated users. Now ready to look into alternate implementations. check-in: eae1cbc1d8 user: aku tags: relit
2018-02-22
23:35
Split utilities and normalization into their own packages. check-in: 682c4b1bb8 user: aku tags: relit
22:00
Fixed oops. Added the forgotten new test files. check-in: 0ac7c21442 user: aku tags: relit
21:55
Split the big literal package into several smaller, and more focused pieces. Started with slicing the testsuite. check-in: 1165c36601 user: aku tags: relit
06:09
Reworked the internals of the unicode layer. Strong split into C files and the C/Tcl glue. Exposed the codepoint validation as custom critcl type, and added use of it to 2utf. Extended 2utf and 2asbr with optional flag argument to control the coding (mutf, cesu, tcl is both). Due to the new argument being optional (and last) the existing call sites will not fail, and operate in full mode. Updated the layer's test suite demonstrating the mutf and cesu coding. Added more tests for invalid arguments. check-in: bd70f9d2de user: aku tags: reunification
2018-02-20
18:40
Added more literals outside of the BMP to test cases. check-in: 8d53644d07 user: aku tags: reunification
07:35
Created a Marpa parser for the string and CC lexemes. This parser handles all the various forms for character escapes. Plus a semantics backend which generates the internal literal representation directly from the AST. The above replaced the entire existing literal processor (parse, decode, unescape, type, tags). This was needed because Tcl (especially `subst` as the core of the old `unescape`) was/is not able to handle the full set of unicodepoints (at this time). Updated the bootstrap slif parser to handle the extended escapes too (\u hex x5/x6, \U hex x8). Updated tests, although not all. With this commit the entire input side is now able to handle the full set of unicode, with suitable escape sequences for characters outside of the BMP as well. check-in: ac7dc5acdc user: aku tags: reunification
2018-02-19
05:57
Merge C code generator fix into unicode work. check-in: eac4dc279c user: aku tags: reunification
05:55
marpa::gen::runtime::c Fixed mishandling of zero-length chunks. Generates bad C syntax. Triggered by grammar without any `:discard` clauses. The fix prevents insertion of discard chunks if there are no such. Furthermore now also errors out in the low level ChunkedArray code for zero-length chunks, to catch possible future problems. Reviewed all uses, made notes that none are zero-length now. Added a test demonstrating the possibility. check-in: 14698e1f84 user: aku tags: trunk
2018-02-18
10:27
Continued rework of the unicode layer. See first commit in the branch for the plan. Reworked the big tables of test cases in `literal.test` and moved their setup into separate files (`tests/cases/...`). check-in: a69c32992a user: aku tags: reunification
09:03
Continued rework of the unicode layer. See first commit in the branch for the plan. Created a wrapper around the foreach loops to make the tables of test cases look a bit nicer, and more semantic like. check-in: 60f9856faa user: aku tags: reunification
05:05
Continued rework of the unicode layer. See first commit in the branch for the plan. Removed full vs bmp from the `unidata` tool and generated tables. The tool now always generates tables covering the full unicode range. Updated some tests, but not all. The known test failures are in the various generators and the middleware, due to the CC differences coming from full coverage. Fixing it now does not make sense, because we will have to clean it up again after the introduction of MUTF-8 and CESU-8 support into the middle layers. We clean up these up after that is all done. check-in: 9db9661dae user: aku tags: reunification
03:38
Continued rework of the unicode layer. See first commit in the branch for the plan. Dropped ASBR and grammar generation for the named classes from the `unidata` tool. With ASBR creation in the C level of the main Marpa package it is fast enough to not require caching. This also removes the cache of byte ranges shared among the classes. Remember also that the C code generator backends do their own automatic sharing of byte ranges and refactoring for sharing. check-in: 604b92a271 user: aku tags: reunification
02:37
Start a rework of the unicode layer. The overall plan is to remove the distinction between bmp and full in this layer, and move it into the generators, with some support in the middle, i.e. in literal handling and the transform from codepoints to byte sequences. Optimization: Moved the commands `2utf`, `mode`, and `max` into the C layer. check-in: b481c06f7c user: aku tags: reunification
2018-02-03
00:51
Merged fix for the issue of RT-C mishandling the `proper`-flag into the branch where it was found. Updated test results to match. Marked a number of tests touching on unicode/utf handling as known bugs. Address them when the general utf handling trouble is more solved. Still to address: All the `i_` tests. check-in: 9ed3031891 user: aku tags: language-json
00:28
Fixed mismatch between Tcl and C runtimes. Issue in the C runtime. Forgot to properly convert a boolean `proper` into the flag taken by `marpa_g_sequence_new()`. Conversion added, test cases added. check-in: b86ffae080 user: aku tags: trunk
00:27
Re-enable full set of lang/result checks check-in: 94232b7386 user: aku tags: trunk
2018-02-02
23:19
Fix bad phrasing in comment check-in: 90ce04b75a user: aku tags: trunk
19:50
Added a script to run a fixed demo, from grammar to parser to its use. Plus example json files to use as input. check-in: f82c10dea8 user: aku tags: language-json
18:23
Continued testsuite work. Fixed definition of JSON `whitespace`. Regenerated parsers. Updated parse failure results to match. check-in: eb17740e4d user: aku tags: language-json
2018-02-01
23:56
Continued testsuite work - Added the n_* cases (must reject), and 1st round of results. Reorganized the input/ and result/ directories to separate the various groups better (y|n|i, c|tcl, ...) - A number failures to reject input. - 4x grammar error: \f is not whitespace. - 10x input accepted which should not be (c (bad) vs tcl (ok)) - process vs process-file differences in rt-c (encoding differences?) check-in: 20cb6ea243 user: aku tags: language-json
20:37
Pull rt-c bug fix into the branch which exposed the issue. check-in: 2945ca6cd1 user: aku tags: language-json
20:36
Testsuite work - Clean up of the support code, removed unused procedures. - Ensure that files are read with the proper encoding before fed into the string 'process' (See `fgetc` decl and use). - Allow setting of constraints, runtime-specific - Set __known bug__ constraint for eight y_* tests where rt-c currently diverges from rt-tcl (1). (1) These are all in the unicode/utf-8 handling, which differs between the available runtimes. * rt-tcl operates on chars and defers to Tcl's parsing of utf-8 sequences. * rt-c OTOH operates on bytes, does its own utf-8 parsing, and is more strict (invalid sequences are a parse error). I have to see if I can define a char class (:invalid:) to contain the invalid sequences. Using that would allow me to either accept or discard them (depending on context). Similarly I might have to allow the class of surrogates (:Cs:), as acceptable characters, and as sequences for the characters past BMP. That would allow such characters even in Marpa limited to BMP. These are all things in the MarpaTcl core however, and not something specific to JSON. JSON just exposed the issues here. check-in: 7ab2e4bf63 user: aku tags: language-json
20:17
Added tool similar to `od`, to decode and display utf8 sequences in the input (file, stdin). check-in: 669660f659 user: aku tags: trunk
20:15
Changed gate to lexer flush signaling from in-band `(byte) -1` to a separate function. This removes any possibility that a `(byte) -1` from actual input causes a bogus flush. Added debug function allowing INBOUND to properly print a batch of input bytes. Fixed a crash of the RT-C where the loop searching for the end of the lexeme tried to pop a byte from the empty lexeme, triggering an underflow assert. This may happen when `lexer_complete` is called for an empty-valued lexeme. I.e. when the GATE rejects the first byte after the end of a lexeme as invalid and signals a flush before any byte was entered into the lexer at all. Note that this does not necessarily indicate a mismatch. The current set of acceptable lexemes may contain some which allow an empty value. We have to keep recognizing them. And after that the new context may have caused the invalid byte to be valid. So we only skip the attempt of making an empty value even emptier. The deeper issue is that for LATM-mode symbols the earley-set id does not match the length of the lexeme due to the zero-width ACS guards in front; causing an additional round through the loop before it can declare mismatch. The concrete example which triggered the issue are the `string` and `lstring` symbols in the JSON grammar, for input `[""]`. check-in: 0de21b2314 user: aku tags: trunk
2018-01-31
21:16
Pull the Tcl lexer fix over into the branch where the issue was found. check-in: d411dda199 user: aku tags: language-json
21:09
Fixed typo in the spec of escaped characters in strings. Fixed definition of `control` characters for JSON. Updated the results to match the tweaked grammar. For the Tcl runtime all tests pass except a few showing mishandling of numeric lexemes. A fix for that is waiting on trunk. RT-C still crashing. check-in: 6dfddb13e8 user: aku tags: language-json
20:58
Fix mishandling of lexemes interpretable as Tcl number by the Tcl runtime (lexer component). By going through `expr` a lexeme which looks like a number can be shimmered and may change its string rep when printed. Example: For JSON the lexeme `1E-2` became `0.01`. check-in: 2a442c3255 user: aku tags: trunk
00:51
The json testsuite is becoming more functional. Of the must-accept-inputs only 10 failures over 95 inputs. Some unexpected parse failures with bogus inputs. These are in part - Possibly due to reading of input with the wrong encoding (Need utf-8?). - Unexpected numeric reformatting reaching the AST (1E-2 vs 0.01) One crash in the RT-C to investigate. Tweaked the grammar a bit to have proper symbols for the constants, and to separate G1 and L0 better. check-in: dda6670b00 user: aku tags: language-json
2018-01-30
23:25
Pulled fix for Tcl code generator issue into the branch where it was discovered. check-in: 5f8cb41c75 user: aku tags: language-json
23:19
Fix issue in the core code generator for parsers and lexer using the Tcl-based runtime. A bug in package `char` (See `char quote tcl`) caused the generation of bogus Tcl charclass regexes from the internal data, when non-ASCII characters in [:control:] are involved. The generator now works around the issue. check-in: 65b1517840 user: aku tags: trunk
21:02
Added the first larger grammar example outside of the SLIF meta grammar: JSON. Known issues at this point: * Due to apparent trouble with Kettle (`build.tcl test` seems to ignore `--include-dir`) the testsuite is not yet functional. A basic test via `tools/trial` however works. * The generated Tcl parser is bogus. The main character class for string characters (`plain`) is bogus, it contains a bad range which is rejected by Tcl's `regexp` during parser construction. The C-based parser is ok however, modulo lurking unknowns. check-in: 5199afa673 user: aku tags: language-json
10:17
Fix oops, forgot to add test output for the slif meta grammar. check-in: 466c1ebc4d user: aku tags: trunk
10:16
Added formatter producing a SLIF grammar from a grammar container. Note, this is not fully round-trip at the moment (The special @LEX symbols can not be read back, violating identifier syntax). It is also sub-optimal with regard to LATM flags, g1 actions, etc. These are shown as attributes of each rule instead of making use of defaults to reduce duplication. It should be good enough however to serve as debugging aid. check-in: 3bfc0de63c user: aku tags: trunk
2018-01-29
19:28
Extended the set of formatters producing code initializing a grammar container (GC). Renamed the existing GC formatter to `gc-compact`. Added two formatters to generate non-compact human-readable code, using reduction rules for Tcl and C. check-in: 8d77fed34b user: aku tags: trunk
2017-10-17
16:30
README tweaks check-in: d2d1b00d53 user: aku tags: trunk
16:22
Updated the README to match the current organization of the (code in the) repository. check-in: f45f21924c user: aku tags: trunk
03:18
Merged fixes on flush behaviour to mainline. check-in: 62d99b6274 user: aku tags: trunk
03:13
Fixed demo grammar (wrong start symbol), then shown fix vs not in Tcl vs C runtimes. Then fixed C runtime flush behaviour. Further fixed mishandling of lexeme value and length in the presence of redo. Closed-Leaf check-in: a78dda3a4d user: aku tags: flush-fix
2017-10-16
23:17
Demonstrate the multi-flush bug. Fixed RT-C issue with actual lastchar lost/overwritten by redo, messing up the error message generated. check-in: 886eb6bb40 user: aku tags: trunk
22:24
And back check-in: ce762c6d5a user: aku tags: trunk
22:20
Pull trunk. Closed-Leaf check-in: f32641a83d user: aku tags: runtime-tests
22:12
Pull in the fix for L0 discard past G1 end, updated tests, fixed a few more things in the Tcl runtime (To early destruction of the parser-level recognizer prevented generation of proper error message for non-discard token after G1 end). check-in: bbff87f317 user: aku tags: runtime-tests
21:17
Added tests demonstrating bad behaviour when exhausting a parser while still having input (discards and not). check-in: e288571010 user: aku tags: runtime-tests
20:30
Added foundation of testing the runtime with arbitrary grammar/input pairs, and highlevel test drivers for the Tcl and C runtimes. check-in: e7ab54549b user: aku tags: runtime-tests
2017-10-15
16:55
Use OSX fixes. They were done as separate branches to remember to check behaviour when back on linux. check-in: 97bbaff3f9 user: aku tags: trunk
16:54
Silence compiler complaint on OSX. Leaf check-in: 09b264fb4a user: aku tags: osx-complaints
16:53
Added return after assert to silence compiler comlaint (OSX). check-in: 12ad722f66 user: aku tags: osx-complaints
16:50
Fixed problems in the handling of charclass as set of code-points and -ranges. Range validation was incomplete, allowing bad input to crash. Fixed, and tests added. Tracing as well, plus more notes when certain code paths will be reached. check-in: ac18987fd3 user: aku tags: trunk
04:46
Moving critcl after tcl solves OSX issue with install dependency order. Check if this breaks linux. Leaf check-in: 243e280f60 user: aku tags: build-order-trouble
2017-10-12
06:59
Tcl runtime. Fix flush issue where partial flush and redo needs recognition check-in: f26d4f328e user: aku tags: flush-fix
2017-10-11
05:28
Mark recognizer cons/dest points better check-in: 08e6e9634d user: aku tags: trunk
2017-10-06
22:01
Equivalent changes in the C runtime. 1. The C runtime already inter-twined tree extraction, valuation and hand-over which was added to the Tcl runtime in the previous commit. 2. Fixed same issue with possible L0 discards after G1 end. 3. Fixed bad assertions in symset, byteset, exposed by 2. check-in: 32c320340a user: aku tags: trunk
20:12
Reworked parser completion handling. Do not pull and save all possible parse trees into memory anymore. Instead eval each tree immediately after extraction and pass the resulting SV to the outer backend. Further a bug fix, tell the lexer about expected terminals (none), so that it can still handle any L0 discards which may occur after the G1 end symbol. I.e. while we are not expecting the G1 token stream to continue the L0 byte stream may still have input to process. TODO: Have to add test cases for this situation, both where only the expected discards occur, and where unexpected actual G1 tokens are present. check-in: 8c6bdade0a user: aku tags: trunk
19:36
Fix in Tcl runtime tracing. Bring necessary variable into scope. This was forgotten when placing various operations into their own methods for clarity. check-in: bbe2253bdb user: aku tags: trunk
19:33
Debugging enhancement, show actual semantic values in valuation steps. check-in: 4f1c755959 user: aku tags: trunk
19:31
Big tangle of single package sliced into several packages, each containing just related code. check-in: 5fde5977d2 user: aku tags: trunk
2017-10-05
21:51
Fix package meta data typo. Closed-Leaf check-in: d38f475f67 user: aku tags: slice
21:39
Reworked naming of the generator packages, and associated namespaces. Searching for plugins, i.e. more generators is now simpler (no special cases to exclude). check-in: 912cadf759 user: aku tags: slice
18:59
Updated marpa-gen to new sliced setup, and filled `list-plugins` in marpa::export::config. Next up, look into renaming packages for nicer structure. Start with exporters. check-in: 649487dd0c user: aku tags: slice
08:06
Heal fork, complete. check-in: 444c10e2e4 user: aku tags: slice
08:05
Heal fork Closed-Leaf check-in: 2175b86257 user: aku tags: slice-2
08:04
Split the remaining pieces into three packages: - C runtime - builtin parser (C runtime) - Low-level C wrapper for Tcl runtime foundation Updated tests to work again. More reshuffling. check-in: 77883b0ffd user: aku tags: slice-2
03:34
Fix missing requirements in the internal tool to re-create the builtin parser. check-in: ae36822717 user: aku tags: slice
2017-10-04
23:49
Took Tcl runtime out of the tangle. Left tangled are the low-level C wrapper and the C runtime. Some shuffling of parts. Note: Needs Kettle commit [kettle:c0f0b90c04] (kt::local* addition, scan fix, @owns fix) to work. check-in: 30d4d13ed3 user: aku tags: slice
22:14
Detangled precedence rewriting, and the exporters, mostly. Have places using an exporter where we need only part (gc formatting). Structure does not make for nice format/plugin discovery either. check-in: 4843e825d1 user: aku tags: slice
06:18
Extricated SLIF semantics, and general literal handling. check-in: 575ffcc030 user: aku tags: slice
05:20
Extricated SLIF container implementation and low-level Tcl utilities from the tangle. check-in: 51f54f5f55 user: aku tags: slice
2017-10-03
23:34
Carved the lowlevel unicode support (table access, char classes, case-folding) out of the tangle and placed into its own package. check-in: 2437f3ebb9 user: aku tags: slice
22:12
Extended `marpa-gen` with short options. check-in: 0b47aba686 user: aku tags: trunk
22:06
Remove superfluous initializer. check-in: 91d8cf0a47 user: aku tags: trunk
17:17
Removed generated qcs map, and code doing the generation. Made the information static in the `sem_tcl.c` glue. Made a few other functions static in there, renamed. check-in: bb205813ee user: aku tags: trunk
05:46
Pull completed base RTC work into mainline, and close. check-in: 5c42aa1913 user: aku tags: trunk
05:13
Bootstrap step 2. Switched builtin slif parser to RTC-based implementation. Tests pass. Closed-Leaf check-in: b5b432d6c4 user: aku tags: runtime-c
00:01
rtc lexer fixes for single-value SVs. Which are single, and single-element lists are reduced to their element. check-in: f0dd695aeb user: aku tags: runtime-c
2017-10-02
23:58
Separate lexer results by engine. Updated results. check-in: 7a41f6af9c user: aku tags: runtime-c
2017-09-30
22:43
Tweaked clex exporter to match lexing-only mode of RTC, particular in its use of the C-level result callback. Updated export test results to match. Added testsuite for rtc lexing-only mode matching tlex. Does __not__ pass the latter yet, i.e. tlex/clex differences (quoting in part, values in part). check-in: cd68d425f1 user: aku tags: runtime-c
21:56
Commit [aa5c236ec3] was wrong. The file was used to find the export test cases. Tweaked the suites to now look for their result files instead of a separate flag file. check-in: 9d189e9390 user: aku tags: runtime-c
21:06
Tweaked RTC to enable execution in a lexing-only mode. Triggered when initialized without a G1. check-in: eb9b15eb66 user: aku tags: runtime-c
21:05
Run RTC generation through external `critcl` app. Using a separate process is the important point, preventing the differing parsers and lexers from itnerfering in memory (attempting to multiply define various custom arg/result types.) check-in: 29220fb5fb user: aku tags: runtime-c
21:02
Fix argument assertions for set link at high end. `n` can and may reach `capacity`. check-in: ae9f6195e3 user: aku tags: runtime-c
2017-09-29
22:51
Replaced poking into sv/vector internals with a proper API function, and updated users. check-in: 80b087c9b9 user: aku tags: runtime-c
19:32
Remove bogus result file check-in: aa5c236ec3 user: aku tags: runtime-c
18:44
Moved requirement for marpa to places where it does not interfere with basic usage. Further changed default name for the output to be derived from the name of the grammar file. check-in: b4ce1a0d87 user: aku tags: runtime-c
18:42
Fix missing destruction of helper objects; interfered when used multiple times. check-in: e385825b01 user: aku tags: runtime-c
18:41
Added tests for cparse and clex exporters. check-in: 47a56e2331 user: aku tags: runtime-c
06:23
New exporter, clex-critcl. Reduced RTC-based engine, L0 only. Will require modifications to RTC to allow operation without G1 / parser. Renamed the rtc-* exporters to cparse-* Tweaked marpa::fqn utility (varname argument instead of value) and started using it (slif::semantics, exporters). Trouble during the work below shows the need to force objects to FQN form immediately on entry into a method, a later conversion may go wrong depending on context (namespaced procedure, vs global procedure, vs global code). Moved the core testsuites into subdirectory `common`. Reworked and renamed the code essentially implementing a testsuite-specific variant of `bin/marpa-gen`. Updated all users. Factored code out of the zeta-lexer in preparation for use with `clex-*`. check-in: 9c90529961 user: aku tags: runtime-c
2017-09-28
07:51
Renamed exporter for rtc-based parsers. Further moved common code out of various test suites into separate, shared files. check-in: 90c7278bc9 user: aku tags: runtime-c
2017-09-26
23:22
Add a few diagrams showing coarse architecture check-in: cdd3d6fd00 user: andreask tags: runtime-c
22:37
Moved testing of the slif semantics into a common core and added testing in conjunction with generated parsers. check-in: 8741cc1279 user: andreask tags: runtime-c
2017-09-20
23:56
Bring rtc work to main line. Near-parity. TODO: Specify, generate, and test RTC lexer. Test RTC with semantics and containers. Add engine perf testing check-in: 48b47bd193 user: aku tags: trunk
23:52
Pull RTC error generation and associated fixes into the main rtc work. check-in: 5ee3764313 user: aku tags: runtime-c
23:50
Fix constructor of parser generated by rtc-critcl. All tests pass. Closed-Leaf check-in: 809750bb3e user: aku tags: rtc-scratch
22:51
Get the fix for the memory smash. Tweak to vector handling, do not allocate element array for empty vector. Factored vector expansion into helper function, and avoid iterated expansion. check-in: efee7dadf9 user: aku tags: rtc-scratch
22:46
Fix memory smash. Code wrote to string[-1] when dealing with an empty vector. Added special-case code avoiding this. Closed-Leaf check-in: 8423e46502 user: aku tags: rtc-fix-smash
22:42
Added lexer mismatch information to error message, updated expected AST information. check-in: 3b78c7b49c user: andreask tags: rtc-scratch
05:01
Factored common parser testing into a single file driven by builtin and generated parsers (tcl, rtc). Allow for engine-specific results. Fixed expected error results for tcl to match the <>-bracketing added by [74bf29e7cc]. TODO: Complete errors generation for RTC, and fix up its results. TODO: Hunt the smash. check-in: e2fa82d9c9 user: aku tags: rtc-scratch
2017-09-19
23:37
- Tweaked Tcl engine progress report (Now <>-bracketing the lhs symbol). - Tweaked char quoting for RTC, use octal for 127+. To match the chars/bytes in the progress report. - Reworked the progress report to proper align the columns. Uses SV data structures (string, vector) to hold the interim data. check-in: 74bf29e7cc user: andreask tags: rtc-scratch
23:36
Crash fix: Skip progress report if there is no lexical recognizer. check-in: 906570029c user: andreask tags: rtc-scratch
21:14
(Crashing in tests, seg.fault) Added L0 progress reporting to the error message. The differences in the reduced grammar used to operate the engines make it clear that we cannot exactly match their error output. While we can get close, we still need per-engine results here. Redo the zeta to be the tcl engine and then the main testsuite can be made to match whichever engine we use as builtin. check-in: e46c42dfa0 user: andreask tags: rtc-scratch
19:35
Get the latest trunk fix. check-in: f2a0be70b1 user: andreask tags: rtc-scratch
19:31
Tweaked char quoting in error message, and reworked to assemble via a DString. check-in: 22a19ba8aa user: andreask tags: rtc-scratch
14:56
Expanded and tweaked generation of error message. Disabled most of it, have to track down a memory smash. check-in: 0cecbfe05b user: aku tags: rtc-scratch
05:22
Update test broken by the (:space:)-fix committed with [d00f6c9ec5], match results again. check-in: d28460b501 user: aku tags: trunk
00:14
Draft work on generation of error messages by the RTC - In the Tcl glue code. Raw engines are on their own for now. check-in: 6285fa4457 user: andreask tags: rtc-scratch
2017-09-18
22:14
Move the Tcl specific includes into a single header to allow easy replacement for other environments. TODO: Document the macros. TODO 2: Create a basic header for libc environments without tracing. check-in: 3a158ad5ae user: andreask tags: runtime-c
22:12
Remove rhs listification in progress reports. check-in: f5e2dc153a user: andreask tags: runtime-c
21:25
Moved the general completion processing out of the template into the RTC. check-in: 17e57b15d2 user: andreask tags: runtime-c
20:00
New testsuite to test the RTC engine using the SLIF grammar. Currently 29 fails, from a quick look all due to the missing generation of a proper error message. ASTs are ok. No crash (anymore, see sva dup fix [9e306213de]). check-in: 08dbff3993 user: andreask tags: runtime-c
19:58
Fix argument name mismatch in RTC template, vs Tcl engine. check-in: 815f5a7dcf user: andreask tags: runtime-c
19:57
Fix dup of empty vector, nothing to copy. check-in: 9e306213de user: andreask tags: runtime-c
19:47
Disable inadvertently committed tracing check-in: 690cffca93 user: andreask tags: runtime-c
2017-09-16
00:08
Reworked the template for rtc-critcl to capture the parser SVs, via the new callback. The Tcl level sees the last SV captured. That is identical to how the Tcl engine behaves. A test run using the slif meta grammar produces a proper Tcl AST, no crash. Known limitation: No proper error handling/value yet. Even so, testing against the suite of grammar examples can commence, using an rtc-*.test analogous to the zeta-*.test check-in: 6c632a56cb user: aku tags: runtime-c
2017-09-15
23:54
Extended rtc main with user-callback to handle the SVs generated at parser level. check-in: 138331137d user: aku tags: runtime-c
23:51
Implemented conversion of rtc sv structures to Tcl_Obj's check-in: 02454030d1 user: aku tags: runtime-c
23:50
Moved shorthands for access to SV structures into header for use by other parts. check-in: 472af77556 user: aku tags: runtime-c
21:59
Fixed issues with `marpatcl_rtc_sva_filter`, to wit: - Reordered testing of the possible cases, were in the wrong order. - Modified filter guard, prevent access to mask array beyond its size. - Added forgotten stepping of destination index in copy-down case. - Added nulling of the removed elements after their release. This enables the next. - Truncation of result now reduced to single assignment of new size (and fixed an off+1 on the size). Tracing tweaks in filter and parser: - Rephrased filter tracing output, added indiex information. - Further extended to dump basic vector content (in, out, intermediates). - Extended parser valuation traces to show filter masks. - Tweaked spacing for step-nulling to align output with the other types. check-in: 00aeda6759 user: aku tags: runtime-c
15:20
Bring valuation fix and associated crash fixes into the work. check-in: 7ccc133594 user: aku tags: runtime-c
15:19
Previous commit is fix for issue with valuation. Fixing more issues now exposed: lexer/parser: Fix bad check for exhausted recognizer. lexer/parser: Extend field size for rule/token ids in trace. lexer/parser: Disabled marpa valuation tracer parser: Fix double destroy of the recce. sem value: Enable string generation for null references sv arrays: Fix index/size handling and assertions in copy. sv arrays: Handle nulls when free'ing instances. Known issue: The parser seems to insert NULL into the generated sv at some places. Suspicion is on the sva filter functionality and around that. To be investigated. Closed-Leaf check-in: 614482ab63 user: aku tags: rtc-trial-val-fix
2017-09-14
20:48
Added "marpa_g_force_valued" used in the Tcl runtime. Check if this fixes the issues with valuation. check-in: 12fecaf845 user: andreask tags: rtc-trial-val-fix
20:46
Comment fix check-in: 73c064c5ef user: andreask tags: runtime-c
2017-08-29
20:02
Implemented the byte range refactorization TODO (sharing definitions of overlaps between ranges). #setup instructions are moderately up, #rules strongly down. (No effect on the issue with value iteration). check-in: 94417ce2bc user: aku tags: runtime-c
2017-08-28
21:44
Added big todo about global byte range optimization to the rtc generator. check-in: 1ce7257d39 user: aku tags: runtime-c
21:09
Add statistics about rules and rule instructions to the rtc generator output and templates. check-in: 9f491209c6 user: aku tags: runtime-c
20:39
Tweak readable rule output from rtc generator core to show alternates better. check-in: fa91643e0e user: aku tags: runtime-c
20:15
Added progress reporting to lexer and parser. Additional runtime data comes out of the grammar setup for this, i.e. mapping from rule ids to lhs and rhs symbols. Testing shows good output for both lexer and parser. Also tells us that our problem with the valuation retrieval is not based in bad recognizer operation. Rules get predicted, processed, and finalized as they should. check-in: 43951636ba user: aku tags: runtime-c
20:12
Added new tracing stream focussed on the gate input, i.e. the processed byte stream. check-in: 7c887af758 user: aku tags: runtime-c
20:10
Extended the rtc generator core to provide readable rule info for the rule instructions. check-in: e9168bb09a user: aku tags: runtime-c
2017-08-24
16:24
Get current state of the rtc work. Mainly to get all the tracing correct here in trunk. check-in: 79e0734010 user: aku tags: trunk
16:23
Get fixes from trunk. check-in: 24f27282be user: aku tags: runtime-c
16:12
Fix `marpa_` for functions of the binding. Should have been, now is `marpatcl_`. Intermixed some tracing changes from the `runtime-c` branch. check-in: 012110015d user: aku tags: trunk
2017-08-23
04:11
Replaced various ifdef trace conditionals with conditional trace macros. check-in: 1f3831b952 user: aku tags: runtime-c
2017-08-22
22:36
Moved old Tcl-level event handling support over to TRACE. check-in: 30f37e21a4 user: aku tags: runtime-c
22:36
Added basic event handling (just printing, via TRACE support). Seeing only e-exhausted from lexer,. In line with the Tcl-level engine. check-in: d43c89b0f8 user: aku tags: runtime-c
21:08
Add marpa tracing of valuation. Non-public function to activate. Unclear what parts of the value to print for trace steps. But they appear, which is something. check-in: 10d903f015 user: aku tags: runtime-c
19:41
Disable all tracing. Need some pre-commit hook to prevent check-in of active tracing. check-in: 49a76c16bb user: aku tags: runtime-c
19:03
Merge bugfix to `marpa::version`. check-in: 1d28a85391 user: aku tags: runtime-c
19:03
rtc critcl generator, template: tweak comments for clarity. Add debugging config commands. check-in: 828be518d7 user: aku tags: runtime-c
18:58
Assert order, tree, value objects after creation check-in: 16624fbdf9 user: aku tags: runtime-c
18:57
Move bocage tracer from fprintf over to TRACE. Assert order, tree, value objects after creation. check-in: 6d217aae8e user: aku tags: runtime-c
2017-08-18
21:26
Fix marpa::version - remove superfluous arguments copied from check-version. And argument to the API function is an array of 3 ints, not a single int :( Stack smash fixed. check-in: 8fd67a60d2 user: aku tags: trunk
16:25
Fill out the tracing where not done yet. Tweak the trace messages somewhat (show types as cast where needed (pointers, mainly)) check-in: 6af1654715 user: aku tags: runtime-c
2017-08-17
04:17
Another redo of tracing, adapting to the latest work in critcl. check-in: 150f7ac23a user: aku tags: runtime-c
2017-08-04
04:33
Added code to stringify semantic values, use it in tracing. check-in: 288de21b91 user: aku tags: runtime-c
2017-08-02
23:04
Fix missing check post parser alternatives, fixed handling of lexeme length (get_lexeme is destructive). check-in: 848e47f0c7 user: aku tags: runtime-c
23:00
Reworked the critcl tracing support and adapted RTC to it. (Support for file/tag based activation of trace streams). General tweaking of trace output (more and more symbolic information). As part of that get_parse() rewritten to a switch. Redone the tag checks in the SV code. check-in: 19333843f4 user: aku tags: runtime-c
22:43
Keep D(isplay)Names as proper list, and fix mishandling of empty list check-in: e0bfc6da4b user: aku tags: runtime-c
07:39
Fix typo introduced by commit [f7d1fcaad9] check-in: 55d69982c2 user: aku tags: runtime-c
06:54
Pulled unicode CC fix (class :space:) into the RTC work check-in: f440b08f4b user: aku tags: runtime-c
06:53
Extended tracing, print symbol names in various places. check-in: 5af0029e73 user: aku tags: runtime-c
06:51
Tweak and extend tracing (print names of symbols accepted in gate) check-in: 17de50b73c user: aku tags: runtime-c
06:50
Extended bytesets with `size()` accessor function. check-in: 051e2c0880 user: aku tags: runtime-c
06:49
Fix indentation handling for rule code. check-in: deaf6d68d4 user: aku tags: runtime-c
06:41
Fix definition of CC `space`, match Tcl `string is space`. Found while working with RTC and a test not recognizing \n as a character in [[:space:]], contrary to Tcl `regexp`. check-in: d00f6c9ec5 user: aku tags: trunk
02:59
Switched to the enhanced critcl::literals, critcl::emap packages, with the ability to provide C-level access to pools and mappings (pre critcl 3.1.17 work). Updated the marpa pool and map definitions to provide Tcl and C level access. Added C-level tracing to lexer. Added missing grammar precompute. Added better libmarpa checking. Fixed handling of the accept symset (sync'd with generator core): - Store terminal symbols (and pseudo-terminals for the discards) - Convert to ACS on entry (marpa_r_alternative) into the recce. - Capacity limited to lexemes + discards now. - Original scheme would need 256 never-used entries for the byte symbols. Fixed mishandling of token extraction in `get_parse`. check-in: 1e9c44d234 user: aku tags: runtime-c
2017-07-31
19:24
Added C-level tracing to lexer. Added missing grammar precompute. Added better libmarpa checking. Fixed handling of the accept symset (sync'd with generator core): - Store terminal symbols (and pseudo-terminals for the discards) - Convert to ACS on entry (marpa_r_alternative) into the recce. - Capacity limited to lexemes + discards now. - Original scheme would need 256 never-used entries for the byte symbols. check-in: c495a7b25d user: aku tags: runtime-c
18:07
Added C-level tracing to parser and fixed various issues: - Added missing grammar `precompute`. - Added missing recognizer `start_input`. - Added better libmarpa checking. Similarly for the overall rtc object, added C-level tracing, added missing init of `fail` manager and fixed ordering of init for `store` manager. check-in: 812b8db505 user: aku tags: runtime-c
17:51
More C-level tracing, added to `byteset`, `fail`, `gate`, `inbound`, and `symset`. Further extended the `fail` manager with a function to check libmarpa results (TODO: proper texts for the error codes). Used this to extend the `gate` with better libmarpa checking. check-in: 40dbbbd2c8 user: aku tags: runtime-c
01:32
At this point the output from the rtc/critcl generator compiles and loads ok, with functional constructor/destructor methods. check-in: fce56a94ff user: aku tags: runtime-c
2017-07-30
23:13
Added assertions and traces to rtc grammar setup. Fixed bug in brange handling. check-in: 89bfe909c6 user: aku tags: runtime-c
23:12
Oops. Added new generator to overall package. check-in: 7c0e018f80 user: aku tags: runtime-c
23:11
Added rtc/critcl generator. Reworked the array emission once more, expose prefix as core configuration parameter for the two generators using it. check-in: fbd2f11257 user: aku tags: runtime-c
2017-07-28
08:37
Redid array formatting support code for more control over the formatting of chunked arrays. Fixed the quirks of the previous tweak. check-in: 268723c107 user: aku tags: runtime-c
07:32
Removed inadvertently committed debug output. check-in: cc5a8891c3 user: aku tags: runtime-c
07:30
Split the RTC generator into core and generator with asset, same as the others. Tweaked array formatting. Have to work out some quirks introduced by the tweak. check-in: 7d49a0ee9e user: aku tags: runtime-c
2017-07-27
19:44
Updated C runtime work with trunk work (generator refactoring) check-in: bf01f1bdd9 user: aku tags: runtime-c
19:42
Refactored the tparse and tlex generators. Moved the core operation (essentially identical in both) into a separate package. The generators now only invoke that core to obtain the configuration they then insert into their template. Their difference now only is in the templates they carry. check-in: 2af5734d2d user: aku tags: trunk
17:33
Fix comment typo in bocage Extended rtc generator to check limits imposed by the data structures of spec.h Extended further to handle RHS masking in G1 rules And further to generate a map from rules to lhs symbols Extended the spec structures for this as well. Fixed bug in the array chunker for empty labels (missed prefix, prevented line breaks and dropped separators). Fixed issues in the _S_PER_ generator and coder. Small optimization for rules without data (i.e. no mask, no semantics). Added decoder to spec functions. Tweaked rtc lexer semantics handling (using macros to capture the memoization and make it look more table-like). Used same structure in rtc parser semantic handling. Filled in parser forest handling and tree execution. Added descriptor codes for `end` and `g1end` - Backend support, no frontend support (yet) General fixes. Moved location of code performing G1 token entry somewhat, i.e shift in the lexer/parser API. Added supporting sv(a) functions. Tweaked SV functions. check-in: 3dc8b80196 user: aku tags: runtime-c
2017-07-25
16:37
Added fail handling and semantic store. More filling out of lexer re lexeme completion and semantics. Started to fill in parser, and semantics at that level. check-in: cab135cf5b user: aku tags: runtime-c
16:32
Tweaked string SVs. copy flag -> own flag. No copying, indicator of ownership -> auto-free or not check-in: cf0dffbe55 user: aku tags: runtime-c
16:31
Added helpers for string pool access. check-in: 1692e7393b user: aku tags: runtime-c
16:29
Block rtc export until we have limit checks added check-in: e98d51c252 user: aku tags: runtime-c
2017-07-24
16:40
Brought prettification back to work branch for the C engine check-in: edd2833b03 user: aku tags: runtime-c
16:38
Prettification completed, everything matches, exporter adapted. Closed-Leaf check-in: 8b2be68de7 user: aku tags: scratch-2
2017-07-22
00:28
More prettification. check-in: c96284ed90 user: andreask tags: scratch-2
2017-07-21
21:00
Cleanup of supporting low-level data structures (byte sets, symbol sets, integer stacks). -- Naming, code structure. TODO: Create critcl interfaces to these, for testing. NOTE: These might also be useful to boost the performance of the Tcl runtime. check-in: 12641c849b user: andreask tags: scratch-2
19:53
Merged the differing scratch-work check-in: 31a9980c8b user: andreask tags: scratch-2
05:39
scratch-2 check-in: 42aa6c16f4 user: aku tags: scratch-2
05:37
More compression for the grammar structures: Split length information from string pointers. Removed latter, added string offset in a single large pool string. Boxed the start/stop information byte ranges into a single entry. Added string transform to the pool handler, applied to the lex symbols. Drops all the @-prefixed tags, adds dashes and []-bracketing to ranges and char classes. This reduces the string pool further. Fixed bug in the names for array semantic tags. check-in: 5be3a63e1c user: aku tags: runtime-c
00:09
scratch Closed-Leaf check-in: 1353b0033f user: andreask tags: scratch
2017-07-20
16:03
Added a generator targeting the C runtime (RTC). Extended the configuration utility package with ability to query keys directly. This needed by the RTC generator to get access to the grammar name, from which it will derive a C identifier (prefix) for the generated structures. Lots of changes to the C grammar specification structures (spec.h) and engine setup (spec.c) Still not complete. Masking (Hiding RHS elements from the semantics) is missing. Revamped the bytecode system used to encode the grammar rules a bit, for smaller structures. Byte ranges are now stored as specified, and expanded at setup-time (into their set of priority rules, alternations of bytes). Similarly limited the size of various things to shorts (16bit). Still more things possible: Separate string lengths from the pointers in the pool, i.e. 2 separate arrays instead of one array using a structure with 6 byte of padding. Rework pool generation to reduce the amount and size of strings (strip various tags from symbol names, making them shorter, and causing more duplicates, i.e. less strings to keep around). check-in: b2de58d181 user: aku tags: runtime-c
2017-07-14
22:31
Filling out the gaps. The basic skeleton is now present, modeled like the Tcl runtime check-in: 0c5f1333fc user: aku tags: runtime-c
07:51
Start on the C runtime. check-in: 84db43e656 user: aku tags: runtime-c
2017-07-13
23:34
More optimizations in the Tcl runtime. The sub lexer structures (inbound, gate, lexer internals) use only simple semantic values (location, char+location). These can be transfered directly, there is no need to go through a store, and every reason to avoid it (lots of per-character method calls to put, get, box and unbox the values). The API changes required testsuite updates. Higher-level output was not changed, and passed. Fixed a bug in the SV handling of the lexer (KnownValue) exposed by the change. Benchmarking using `remeta` (which parses the meta grammar) found an improvement of about 7.5% in bytes read per second. This was calcuated by comparing the max speed taken over one thousand runs, with and without going through the store. Further possibilities to consider: - Merging `inbound` into `gate` into `lexer`. (reduce cross-instance method calls) (reduce method calls through inlining). - Read larger chunks from files, channels. (Currently 1K chunks, what about 4K, 8K ?) - Move pieces into C. This can inform a full-C runtime, or even be part of it. check-in: 5a1350dba9 user: aku tags: trunk
19:27
Cleanup: Moved internal doc notes out of the directory for future official docs. check-in: 98c21ecd1c user: aku tags: trunk
18:55
Added caches to the classes `gate` and `lexer`, to boost the performance of method `acceptable`. This method converts from sets of upstream symbols (parser for lexer, lexer for gate) to the sets of local symbols to use in gating. The new caches remember all incoming sets and their conversions. When sets recur the conversion is taken from the cache instead of computed. Benchmarking using `remeta` (which parses the meta grammar) found an improvement of about 2.3% in bytes read per second. This was calcuated by comparing the max speed taken over one thousand runs, with and without caching. <pre> ./build.tcl uninstall install rm TRACE for i in $(seq 0 999) do ./bootstrap/remeta X | grep done | tee -a TRACE done cat TRACE | projection 7 | maxcol <pre> check-in: 13f7fa67a6 user: aku tags: trunk
2017-07-12
22:48
Trajectory and goal: Replace the manually-created boot parser with a generated parser, based on the meta grammar. That parser should regenerate itself, and pass the testsuite. - Fixed issue in the gate which prevented the very character causing trouble from being shown in error messages. The previous character (= last ok) was shown instead. - Extended gate, lexer, parser to provide more information on lexing/parsing failure (progress reports, pre-mismatch stream for current attempt, symbol maps, ...) - Modified the aggregated parser runtime (rt_parse) to use the new information when generating error messages. - Extended boot-parser, container and slif meta-grammar to support the naming of quantified rules. - Extended the export configuration to distinguish grammar (= package name) from debug tags. Latter is derived from the name. - Tweaked comments in the templates for generated lexers and parsers. - Extended the parser generator to insert action and rule names into its output. - Reworked sorting of rules in the parser generator, to prevent the special rules setting actions and names from being torn away from the rules they apply to. - -->> Goal reached <<-- Replaced the manual slif parser with parser generated from the slif meta-grammar. - -->> Milestone achieved <<-- - Fixed bugs and issues in the SLIF metagrammar - TAB is allowed in strings, without escape. - Added proper naming of the rules to match the boot-parser and existing semantics. - Updated the testsuites. - Created apps: - Regeneration of the boot-parser after changes to the slif. - A very basic parser generator. check-in: af6c661e9d user: aku tags: trunk
2017-07-11
20:30
Trajectory and goal: Replace the manually-created boot parser with a generated parser, based on the meta grammar. That parser should regenerate itself, and pass the testsuite. - Modified the low-level `parser` engine, split the non-standard rule ordinals placed into names out, as a new custom array action part (`ord`). Further added support for action and name definition through rules. - Modified the handler for builtin semantics to support `ord`. - Modified the SLIF container to support `ord` in the `action` attribute of g1 rules. - Modified `rt_parse` and boot parser, placed the ordinals back in as a regular part of the action, and moved the definition of the semantics out of the high-level runtime into parser definition, as action definition (See new support in the low-level `parser` engine. - Extended the `tparse` exporter to put action and name information into the rules of the generated parser. - Modified the meta grammar to match the boot parser with respect to `ord`. - Updated all dependent tests, and test support (AST formatting now sees the ordinals, and reconstructs the old form of output). check-in: 536c5e2eb0 user: aku tags: trunk
2017-07-10
22:09
Removed GetString, GetSymbol sem core instances from the lexer. Replaced by code using direct access to parse tree and engine state information to generate the semantic value. Updated test suite. Changes in the output for rule. The old output was a bug in the GetString setup, reporting the internal `@START ~ foo` rule, instead of whichever rule for `foo` was matched. General changes in the lexer output, it now reports separate semantic values for all matched symbols. That was also a bug in the previous setup. It worked only because the runtime was geared towards (start,llength,value), which is identical across symbols. With the proper reporting for other keys, like 'rule', this is not the case anymore. The parser class was updated to match this API change. This change further reflects in the expected reults of the tests for lhs, name, rule, and symbol, testsuite is updated. Some optimization is still done, i.e. it is made sure that identical textual semantic values are stored under the same id as well, i.e. only once. check-in: f7d1fcaad9 user: aku tags: trunk
2017-07-07
23:48
Added testing of the lexer pipeline, specifically the handling of the various semantic actions. NOTE: The result for g1start is a known issue with the current semcore/semstd. This is in preparation for the replacement of GetString with a more direct conversion. Some fixes in the existing machinery to have good results. Some bits of the new machinery already present, untested, and incomplete. check-in: ee9edafcbb user: aku tags: trunk
03:57
Simplified lexer core, replaced GetSymbol handler with simpler direct conversion from parse tree. Updated testsuite. Added some docs about lexeme semantics check-in: e0ce2184fc user: aku tags: trunk
00:11
Tweaks to the lexer-internal documentation. check-in: 37728b785b user: aku tags: trunk
2017-07-06
08:33
Reworked the meta grammar. Removed the ambiguities with exponential blowup of forest size due to a naive specification of string and char class lexemes (vari-length octal and unicode escape sequences). More details in the modified grammar. Updated all dependent files used by the test suites. check-in: 65180c1ee3 user: aku tags: trunk
07:50
Fix a blowup in the boot-parser's handling of posix char classes. Check slif meta grammar for same. check-in: ed2fc65f24 user: aku tags: trunk
2017-07-05
22:04
Moved progress reporting into a mixin (engine_debug.tcl). Added code for the reporting of marpa parse trees (valuation instruction list). Added stream reporting to gate, lexer, and parser. Added progress and forest reporting to lexer and parser. Added capture of complete rule information in the base engine, for the progress report (Semantics might need this as well). Updated the semantics core to match the changes in the engine. Reworked lexer internals with regard to LATM, LTM lexemes, and discards. All of them get an ACS symbol. This means there is an 1:1 mapping between lexeme and discard rules, and the ACS symbols. The difference between LATM and LTM + discard is that the latter group is always acceptable/activated. Note, this means that the first instruction of any parse tree is the token for the ACS symbol, which gives us the matched rule. We do not need the entire parse tree, nor do we have to actually evaluate it. This will allow us to replace the GetSymbol and GetString semantics with simpler code. check-in: 87701775b3 user: aku tags: trunk
21:45
When generating classes write the codepoints 1 to 7 as full octal literals. This prevents misinterpretation as back-references by the gate regexp handling these char classes. check-in: fb088da186 user: aku tags: trunk
21:41
Ignore comments from the semantics. check-in: 566a3da2e2 user: aku tags: trunk
21:40
Fixed bug in boot parser present since beginning: Allow TAB in comments. Dropped duplication of LF. check-in: 8dc5fb4ba2 user: aku tags: trunk
21:37
Extended error message for bad range (start after end) check-in: 004994a2f5 user: aku tags: trunk
2017-06-26
22:21
Removed support for the `bless` adverb from the boot-parser and following stages. Updated tests. Left the cases using bless in the suite, for the parse errors they now cause. check-in: c1b1d00b8e user: aku tags: trunk
18:17
Fixes - Add missing 'create' method in the templates - tparse, tlex backends: - Initialize rules lists - Discards are symbols too - Do not listify the values of character and lexeme maps. The latter is a bool, operation superfluous. The characters are Tcl encoded (char quote tcl) and listifying breaks that. - Sort the words of a list before generating output (All parts are order-independent, order is for human readability and easier search). - tparse backend: - Fix mask generation. Semantics and backend use different data structures for the same information: List of bools over entire rhs, vs list of indices to hide. Updated tests check-in: 7fbda13fbc user: aku tags: trunk
2017-06-23
05:18
Created new engine assemblies for lexer-only and parser, using the boot_parser as starting point (tlex, tparse). These are the high-level runtimes assembling the low-level parts into a complete lexer/parser, configurable with grammar information. Created generator backends targeting these two runtimes. Their output are derived classes providing the grammar information missing in the runtime itself. Added testsuites. The APIs of the gate and lexer classes are modified (gate: def, lexer: latm (removed), export) to better suit the SLIF features. For basic testing of the new runtime the boot_parser has been rewritten to target it. The many incremental steps to keep everything running and testable during the rewrite are left out, this commit is an atomic switch-over. Test suite passes. Attention: The lexers and parsers generated by these backends are not tested yet, only the parse runtime (see above). The template classes are gone. The templates are now stored with the generator classes (attached to the file after the code, see new `marpa::asset` utility function for explanations). check-in: 225afc2f9d user: aku tags: trunk
2017-06-21
05:32
New backend: TLex. Dumps the lexer part of a grammar as the derived class from a tcl-based lexer engine. Extended the grammar container classes with the necessary accessor methods. Note: The engine base class for this still has to be written. Use the bootstrap_parser as the foundational example. check-in: 7773074adb user: aku tags: trunk
2017-06-20
04:40
Added first generator backends: Writing a derived GC container class which loads the grammar on construction. Tweaks to a few serialize methods to strip unwanted string rep, i.e. whitespace/formatting. check-in: ac35603331 user: aku tags: trunk
2017-06-16
05:56
Reworked "marpa_scr_rep_from_any" (charclass intrep creation (SCR)) to avoid unnecessary shimmering. The Tcl_ObjType*'s needed for the direct type checks are determined during package context setup, taken from transient Tcl_Obj* created for just that purpose. Switched back to BMP for the supported range, FULL was incorrectly checked in after testing. check-in: 34b21da2aa user: aku tags: trunk
05:15
Make the semantics work official, pulled into main line. check-in: 0bc6a8787c user: aku tags: trunk
05:12
Now taking the differences between full and bmp(restricted) unicode support into account. Where needed tests now have separate results, one per supported range/mode. Moved the invokation of the generator tool out of the main marpa file into a utility file. The main marpa file is now a bit more readable. The generated files are now keyed by the range they are for. Closed-Leaf check-in: 2b2cf63ab6 user: aku tags: new-slif-semantics
2017-06-15
22:01
Account for differences between 8.5 and 8.6 (wrong#args messages for special `args`, and handling of x-sequences in input (8.6 properly takes only 2 hex characters, 8.5 takes the last 2 of any hex sequence at least 2 characters long)). check-in: 63aaa63395 user: aku tags: new-slif-semantics
19:21
Enhance robustness: Create the possibly missing directory for generated files before invoking the generator. check-in: 63afd69b0b user: aku tags: new-slif-semantics
06:59
Pulled the performance work on char classes and ASBRs into the main development. Passes the tests. No crashes for the entire suite. The new C implementation of <code>2asbr</code> is between 15 and 1500 times faster than the Tcl implementation (i.e. between 1 to 3 orders of magnitude). See below for the known numbers (output from bench/asbr.bench for both branches and manually merged into a single table). <verbatim> Range :: Trunk/Tcl :: Perf/C :: Factor Tcl/C {0} :: 169.013 :: 10.683 :: 15.8207 {0 9} :: 1506.600 :: 12.931 :: 116.5107 {0 99} :: 14781.683 :: 35.007 :: 422.2494 {0 999} :: 242996.860 :: 274.940 :: 883.8178 {0 9999} :: 3233832.400 :: 2693.000 :: 1200.8290 {0 99999} :: 36660675.000 :: 26660.000 :: 1375.1191 {0 999999} :: 432152848.000 :: 277217.000 :: 1558.8974 {0 1114111} :: 484810715.000 :: 308173.000 :: 1573.1771 </verbatim> check-in: 3853490d5b user: aku tags: new-slif-semantics
00:14
Cleanup of ignore patterns. Workspace and checkout are fully separated. Tcl_ObjTypes for char classes and ASBRs - Fully traced, dump functions. - Fixed memory smash in creation of string rep (moved functionality as macro into critcl::cutil) Now getting UNI_MAX out of newly generated C decl file Simplified adding range to an SCR. - Not extending anymore - Assert and fail when going over the allocated capacity - Reason: All places using it calculated the needed cap beforehand and allocated as much. Thus extending is not required and bounds-checking is good enough. Removed the Tcl implementation of 2asbr, norm- and negate-class. Updated the testsuite. Closed-Leaf check-in: 0fd6b229ab user: aku tags: asbr-perf
2017-06-14
08:00
Extended unidata tool to generate its 1st C decls (mode, max). Updated main file to use this. Note that the generated files are now placed into a separate directory just for them. This will be ignored by fossil check-in: 7009d4769d user: aku tags: asbr-perf
08:00
Drop the local helper script for testing from the repo. check-in: 75a243e45e user: aku tags: asbr-perf
07:55
Tcl 8.6 support. The `try` package is only required by 8.5 check-in: 9d9ef26c45 user: aku tags: asbr-perf
2017-06-09
18:00
Ignore more build artifacts. check-in: e3a30b4075 user: aku tags: asbr-perf
08:14
Fixed generation of ranges in the creation of string rep for ASBR check-in: 4beca374ec user: aku tags: asbr-perf
08:13
Moved CC validation into helper function. Better and fixed error messages. check-in: bfaa6cee8f user: aku tags: asbr-perf
08:12
Fix handling of bmp/full in the unicode tests. Different set of named classes, and skip over tests for full range when configured for bmp. check-in: 967653c919 user: aku tags: asbr-perf
08:10
Divorced unidata tool from marpa itself. Only place with a Tcl-implementation of 2asbr and related commands now. Slow, but only once, during build setup. And not limited by an installed marpa's range of supported codepoints. check-in: 9e2b1b63d7 user: aku tags: asbr-perf
2017-06-08
03:46
Fixed issues in low-level norm and negate. Added tracing, nicer assertions. Disabled Tcl-level implementations. Started on ASBR compile. Incomplete, broken, seg.faulting. check-in: 9ac3318345 user: aku tags: asbr-perf
03:44
Added use of critcl c-utilities. Requires unreleased critcl head. Removed beginnings of local c-utils. Added the ASBR obj type draft. check-in: 81100a5bf4 user: aku tags: asbr-perf
03:43
Put test installation into explicit directory. This prevents auto-removal by kettle. Keeping it is necessary for good C-level debugging (symbols). check-in: fc4cb59f20 user: aku tags: asbr-perf
2017-06-06
18:17
Brought up to date with main working branch check-in: 81bbe98e8f user: aku tags: asbr-perf
2017-05-25
18:14
Dropped local "dict-sort". Using "kt dictsort" instead. check-in: 832991391f user: aku tags: new-slif-semantics
18:13
Another whitespace change check-in: b6b6d9fa28 user: aku tags: new-slif-semantics
2017-05-21
02:53
The main directory accumulated files to the point where it became cluttered. Restructured, moved each of the various groups of files into their own directory. check-in: a54dd5d2eb user: aku tags: new-slif-semantics
01:57
Replaced fixed slif meta grammar under tests/grammars/ with reference to new slif meta grammar under bootstrap. The new grammar removes the "bless" adverb. It further uses a redone syntax for strings and char classes which encodes more of Tcl's constraints into the syntax. Modified the semantics to handle the possibility of <single quoted name> using Tcl backslash-escapes, as now allowed by the new meta grammar. While this has not arrived in the grammar used by "s_parser.tcl" yet it may be. Cleaned the bootstrap/ directory up a bit. (Still have to check and compare the various old meta grammars) check-in: a4e5621bbf user: aku tags: new-slif-semantics
2017-05-19
06:20
Whitespace cleanup. Squashed irrelevant trailing whitespace in code and test. check-in: 74492e5609 user: aku tags: new-slif-semantics
06:13
Implemented rewrite of rules with precedence into sets of rules without, with precedence and associativity encoded in the structure of the new rules. Extended container and subordinates with methods to support the transformation (retrieval of the necessary information, per-symbol and general). Updated and extended testsuite. check-in: baddf989f3 user: aku tags: new-slif-semantics
2017-05-18
07:21
Extended validation in semantics and containers, prevent recursive quantified rules. Added tests. check-in: 3627fce321 user: aku tags: new-slif-semantics
06:40
Typo fixes in docs for precedence rewrite. check-in: a7630b86a8 user: aku tags: new-slif-semantics
2017-05-17
20:02
Implemented basic container validation. As part of that added some plumbing so that the objects internal to the container have a reference to their collection (container, or grammar, depending on context). Needed as some validations have to reach from the location perf,orming them to other parts of the system to get associated data. Extended the testsuites to invoke the validator (result of ctrace -> gcstate, ser/des, reductrion results). Pleased that all of them are good. Related things, to do: Checks for inacessible symbols, non-generating symbols (not reaching terminals/atoms), removal of same. Further to do, rewriting of precedenced priority rules to non-precedenced form. check-in: bc099e52e4 user: aku tags: new-slif-semantics
05:47
Modified reduction to merge identical literals under different symbols into a single definition. Extended and modified the API to pass the merging information up so that users of the removed definitions can be rewritten to refer to the new unified forms. Extended the grammar container and supporting classes to bear the main burden of the rewrite (See method "fixup"). Updated test results. check-in: 5bcb523050 user: aku tags: new-slif-semantics
05:43
Fix lexeme rule generation where a single-element RHS was not constructed as a proper list. Updated test results. check-in: 2db8657abe user: aku tags: new-slif-semantics
2017-05-16
07:13
Reduction choices for a C backend based on bytes and byte-ranges. Fixes for problems missed so far. Extended testsuite. Added the gcr_c result files Optimizations for character and ASBR reduction to prevent rule chains. check-in: 0e9dd3b6ed user: aku tags: new-slif-semantics
2017-05-12
20:45
Fixes to normalizer: * Added bytes and byte ranges. Updated doc/atom.md. Fixes to reducer: * During a redo of the implementation of rstate the invokation of the normalizer on new pieces was inadvertently dropped. Added it back. * Added the missed handling of %namedclass'es to the Tcl specific reduction rules. Factored the common code into a separate proc (CC-TCL). Fixes to grammar container * Added missing method `remove`. Fixes to the container testsuite (reduce-tcl): * Skip over grammars without L0, or no literals. * With `grammar` unpacking done early not needed for getting the literals. * Varname typo * Added the gcr_tcl result files check-in: 5558558738 user: aku tags: new-slif-semantics
08:01
C-level implementations of negate- and norm-class. Exposes trouble in my thoughts about SCR, OTSCR and ref-counting. To redo. check-in: 638a44dd41 user: aku tags: asbr-perf
07:49
Continued Tcl_ObjType, public constructor and accessor, plus cproc glue. Compiles without error. !Untested! check-in: 759de51b4b user: aku tags: asbr-perf
07:36
Added Tcl_ObjType for SCR-based char classes. Compiles without error. !Untested! check-in: 0c5716a4bc user: aku tags: asbr-perf
03:59
Implemented SCR functions. Compiles without error. !Untested! check-in: 2f3cf157f0 user: aku tags: asbr-perf
00:31
For more perf started on C-ifying charclass-related functionality. check-in: 54eaf08a6a user: andreask tags: asbr-perf
2017-05-11
21:52
Removed perf output from the branch. Unwanted here. check-in: f9ea41e0f0 user: aku tags: new-slif-semantics
21:41
Integrate the perf-work for the ASBR compiler. check-in: 18fa980ae2 user: aku tags: new-slif-semantics
21:28
Rewrite of the ASBR compiler. Replaced use of TclOO with plain procedures directly working on Tcl lists. Speedup ~ 2.5-2.9 check-in: efdb819361 user: andreask tags: asbr-perf
16:12
Create new branch named "asbr-perf" check-in: 1da8aac13b user: andreask tags: asbr-perf
06:15
Implemented grammar container deserialization (bulk load). Fixed issue in serialization (empty container) Extended testsuite. check-in: fba7f2ccdc user: aku tags: new-slif-semantics
04:42
Updated reduction rules. Implementation of new rules. Extended testsuite. More shortcuts in the structures specifying the expected results. Plus ability to reference larger blocks of content via file names. Also added tests for the internal ccranges proc. check-in: 04381efeca user: aku tags: new-slif-semantics
04:29
Slight tweaks to the testsuite support. check-in: 7bea4eb850 user: aku tags: new-slif-semantics
04:28
Added more unicode accessors: - Check support for a named class - Retrieve list of supported names Variants for both Tcl and Marpa Unicode Data. check-in: 7580c557d4 user: aku tags: new-slif-semantics
2017-05-10
07:50
Extended literal handling with normalization phase 2, reduction of complex constructs into simpler pieces glued back together via priority rules. Extended testsuite. check-in: 393184ff2c user: aku tags: new-slif-semantics
07:46
Reworked the lowlevel framework for the slif testsuites. Moved the text utilities into their own file and made them more modular. check-in: 8b68b34fc3 user: aku tags: new-slif-semantics
07:46
Integrated the full/bmp choice into the main marpa code. check-in: ad5174fc03 user: aku tags: new-slif-semantics
07:42
Extended unicode tables with information about the supported range. Modified the tooling to allow generation of full and bmp-limited tables. Implemented class negation. Reworked the ASBR and GRAMMAR accessors to resolve byte range references in the result before returning it. Removed the accessors for byte ranges. These are now an implementation detail hidden from the user. Updated testsuites. check-in: c907ffa182 user: aku tags: new-slif-semantics
2017-04-26
19:27
Updated container tests. Another API change (l0 atom -> l0 literal), with corresponding updates to tests. Everything passes again. Next, deconstruction of literals, i.e. normalization phase II. check-in: 69b1ee0e4d user: aku tags: new-slif-semantics
08:32
Integrated new literal handling into semantics. Updated tests. Container API change, breaks container tests. To fix next. check-in: e596035c8a user: aku tags: new-slif-semantics
05:45
3rd time the charm? Another redo of literal handling. Not yet integrated with semantics. Full tests. Some add-ons in the unicode support to help, with tests. check-in: ab2cbccd88 user: aku tags: new-slif-semantics
2017-04-23
04:24
Dropped derived atom classes, replaced with an extended atom base class storing atom type and details. Updated all users. Tests unchanged, and passing. check-in: 9eed4cb141 user: aku tags: new-slif-semantics
02:32
Implemented negated char-classes (Finally). Extended (literal-lowering) and updated (meta-grammar) tests. Added missing pieces for various other types of atoms. Noted: * The number of classes for atoms is getting a bit much, especially as they are essentially all the same, just the data stored in them differs per their type. Yet even that is not truly visible in the signatures either. Therefore next: * Collapse the various atom classes into a single class which stores data __and__ type. * Further, write up the rules for the various atoms, their specification details, and the transformation rules to not loose sight during implementation. check-in: 1fc5f75909 user: aku tags: new-slif-semantics
2017-04-21
21:04
Added tests for cornercases in char classes, and the basic compression. check-in: d80850b630 user: aku tags: new-slif-semantics
19:01
Added basic string deconstruction. Updated tests to match. check-in: ca1a659087 user: aku tags: new-slif-semantics
07:15
Small simplification in state retrieval, disabled debug code. check-in: 58d0aa2893 user: aku tags: new-slif-semantics
06:29
Fixed lurking bug in use of the symbol state engine. When creating a :lexeme (or :discard) the literal's symbol has to be recorded with a l0-usage at that point. Forgetting that is no problem if the symbol is for an actual literal, its state is literal either way. However with the deconstruction of literals into simpler pieces such symbols may stand for priority rules as well. For them, with the l0-usage forgotten they look like toplevel L0 symbols, i.e. lexemes, except that the engine also knows that they were not used in G1, thus reports the situation as error. check-in: 7d929144bf user: aku tags: new-slif-semantics
2017-04-20
23:28
Added structuring comments and indentation to test "literal-lowering" check-in: f81a58af27 user: aku tags: new-slif-semantics
23:07
Reduced comments generated by literal handling. Tests updated to match. check-in: 5059e70410 user: aku tags: new-slif-semantics
22:22
Added specific test demonstrating how literals are simplified. check-in: b50e77e610 user: aku tags: new-slif-semantics
22:00
Fix indentation nit. check-in: 2292cecda4 user: aku tags: new-slif-semantics
19:18
First simplification of literals. Single-element strings to characters, ditto for charclass whose element is a character. Updated tests to match. check-in: 85cd4f4c54 user: aku tags: new-slif-semantics
06:25
Moved literal handling out of the semantics into its own helper class and file. Further restructured the code to allow for easier insertion of simplification rules. Tweaks to symbol names, updated the tests to match. check-in: f2fc6e54fd user: aku tags: new-slif-semantics
2017-04-19
07:34
Comment tweaks for atoms. Fixed (c) year in places. Added atoms for ranges and named cc's. Started on pulling the literal handling out of the semantics into its own class. check-in: c7040480bb user: aku tags: new-slif-semantics
2017-04-18
23:19
Moved l0 literals into their own class, 'literal'. Added character atoms as supported literal type. Updated the tests to match. check-in: 8f33a9c72c user: aku tags: new-slif-semantics
19:48
Reject identical RHS for priority rules. Prevent multi-creation of :discard priority rules for same literal. New tests added. check-in: ac94875686 user: aku tags: new-slif-semantics
06:29
Big set of related changes - Implemented grammar container. - Implemented state dump (serialization) for same - Pretty printer for state dump, for use in tests. - Created testsuite using SLIF ctrace's to drive a container and check the resulting state. - Tweaked various test cases (slif, derived ast, ctrace, container): Removed superfluous :start statements. - Changes to semantics, updated test cases to match. - Moved passing of start symbol to near the end (symbol existence ensured) - Create literal/lexeme definitions only once. check-in: 79414a2533 user: aku tags: new-slif-semantics
2017-04-11
05:52
Dropped "symbol" from container interface. Not needed. Some text tweaks. Updated the tests to match. check-in: 150fa90af9 user: aku tags: new-slif-semantics
2017-04-08
07:09
Filling in the container. General attribute manager, plus first custom subclasses. check-in: 7f1e1389a1 user: aku tags: new-slif-semantics
04:51
Reworked the grammar framework a bit to be more scalable, i.e. not requiring a change with every new key. Updated suites to match. Further added suite and placeholders for container testing. Currently the new suite is a full fail (container completely different from the API coming from the semantics). check-in: 281636de75 user: aku tags: new-slif-semantics
2017-04-07
22:53
Added the slif meta grammar as test case. Note, this grammar has some TODOs to make it play nicer for Tcl. check-in: 4b61c1c2e7 user: aku tags: new-slif-semantics
22:05
Fixed mishandling of quoted symbol in separator adverb during normalization. check-in: a63be89730 user: aku tags: new-slif-semantics
20:42
Oops. Forgot to add the new charclass tests. Fixed. check-in: 46b45891f1 user: aku tags: new-slif-semantics
20:41
Changed the string/charclass rep conveyed from semantics to container, removed tagging, type can be infered unambiguously from the value. Updated tests to match. Further fixed ingestion of named (inverted) char classes. Added tests. check-in: 993a1e814c user: aku tags: new-slif-semantics
19:24
Added handling of escaped chars and nocase to charclass operation. Updated tests to match, also tweaked to actually demo nocase where needed. check-in: a546ce312d user: aku tags: new-slif-semantics
07:28
Reworked the handling of escapes, do it first. Simplifies the processing loop coming after. Tweaked the derivation of symbols, slightly different quoting. Updated tests to match. Added testcase for escaped character forms in strings. check-in: ca988b6b2f user: aku tags: new-slif-semantics
06:24
Moved to decimal codepoints in literal specs. Added handling of nocase for chars, strings. Updated tests to match. check-in: 5b27a4891f user: aku tags: new-slif-semantics
05:05
Grammar tweak to make handling of discard variants easier. Split the discard rule to inline the variants of <single symbol> Completely reworked the handling of symbols (g1, l0, lexeme, discard, literals, etc.) Now using a state-machine to keep track of the contexts symbols are found in, and the conclusions about which future contexts are thus still legal. Redone the handling of fixups for `lexeme default` and `discard default`. User-specified settings of "latm" and "discard events" now form an exclusion list of symbols, with the regular list to exclude them from pulled out of the new state tracking. Literal handling now performs normalization and derivation of symbols in the semantics. The container does not know anything about anymore. FIX BUG: Tcl escapes for strings and char classes are not handled yet, neither by semantics, nor syntax. FIX BUG: Normalization of specs for "nocase" still missing Updated tests to match. Added tests for all the cases missed by the old tracking scheme. check-in: 2d3c378396 user: aku tags: new-slif-semantics