Marpa

Timeline
Login

Timeline

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

228 check-ins using file tests/grammars/l0-rules/priority/precedenced-disjoint/ctrace_tcl version cfd7b0bb1f

2018-09-22
00:01
Added larger doctools examples from Tcllib. Modified dt helper to show new statistics. Untested. Leaf check-in: 3dbe995b99 user: andreask tags: io-statistics
2018-09-21
22:26
Completed IO statistics for rt-C. Fixed name clash in the new code (existing type vs new function, function renamed). Updated tests to match generator changes (cparse, clex). check-in: aa1749989d user: andreask tags: io-statistics
00:00
Added state and methods to collect and return IO statistics (#streams, #processed, #input). Implemented for rt-Tcl. Added tests. rt-C placeholders. check-in: 23a1ea88eb user: andreask tags: io-statistics
2018-09-20
01:54
Merge staging into main stream. Basic parse events (io, lexer) done. Basic input extension done. Doctools example working (macros, includes) json blocked out (issues with partial utf, non-utf, non-standard codings - Tcl vs. C pre-handling, strings vs. file) Leaf check-in: 7337a35e08 user: aku tags: trunk
2018-09-18
17:18
Extended multi-byte char counting test to full unicode range. Closed-Leaf check-in: 15feed5264 user: aku tags: trunk-stage
05:01
Fix trailing whitespace check-in: d336725a4a user: aku tags: trunk-stage
05:00
Fixed miscounting of multi-byte characters in rtC. Reworked `inbound_step()`. Changed to test a character for correctness on the first byte, then arrange to (not) step the character location on the trailer bytes following, if any. Always steps the character location on the first byte. Updates the byte/char index the same time. Further reworked and fixed the interface from `gate` to `inbound` for rewinding after a rejection (straight end, or when the matched lexeme is actually shorter than the number of bytes consumed before stopping). The used API (`moveby`) was character based, which caused us to rewind to far, namely to the previous character. No trouble when it was a single-byte character. Bad for a multi-byte however. This could then trip the lexer if the first byte of the previous character is not acceptable at the given point. check-in: 44fee72091 user: aku tags: trunk-stage
2018-09-17
23:43
Added test demonstrating that rtC miscounts multi-byte character locations (because counting at end). Added tools to help with generation of test input check-in: 9337c00616 user: aku tags: trunk-stage
23:41
Tweaked test parser gen to leave a log file behind on grammar parse failure check-in: 1694246f20 user: aku tags: trunk-stage
2018-09-15
04:46
Trim trailing whitespace check-in: e64b244e12 user: aku tags: trunk-stage
2018-09-10
23:05
Added runtime tests. Note: These look to need some of the fixes on trunk-stage. Leaf check-in: 8b8a1cedf6 user: andreask tags: non-utf-bytes
22:23
Started implementation of non-utf byte support. check-in: 2d87d5c92b user: andreask tags: non-utf-bytes
2018-09-09
06:00
Fixed mindt|doctools crashes. Two for asserts. Problem with `mindt` was in the new `gather_events` function. It was not fully adapted to the changed data structures, with the split trigger and event arrays. It saved the loop index as event id, which was correct for the old structures. With the new structures OTOH the event id is a field in the array we iterate over instead. Problem with `doctools` was as above, plus also an unknown mismanagement of the `always` array by the shared rtC generator code for LTM-mode lexemes, i.e LATM=false. It triggered a similar assertion on lexer setup after the fix above was applied. `mindt` avoided that issue because it did not use LTM-mode lexemes for its special forms. Reactivated the disabled tests. Passing now. Note: json still disabled. check-in: cd54178989 user: aku tags: trunk-stage
00:02
Integrated doctools example - tests - remeta Regenerated doctools example parser to match runtime changes. Disabled tests with known problems - json (char mislocation) - mindt (crash! (rtc, mis-capacity for some symset)) - doctools (crash! (rtc, mis-capacity for some symset)) check-in: cba29d80fc user: aku tags: trunk-stage
2018-09-08
22:16
Integrate base event work. Stage before true merge. check-in: ecf881268e user: aku tags: trunk-stage
22:15
Integrate base doctools example. Closed-Leaf check-in: 77d4610d6a user: aku tags: parse-events
22:11
Brought trunk and event work. Closed-Leaf check-in: d912ded2f3 user: aku tags: language-doctools
22:07
Brought trunk work check-in: 6ceda1b56e user: aku tags: parse-events
21:38
Tkt [d9b47bbb78] completed check-in: f97a8147f6 user: aku tags: parse-events
09:04
Bring in the extended test suite which runs the example tests too. check-in: a48220cd05 user: aku tags: trunk
04:23
More integration of the language examples into the main - Run example tests with the main - Regenerate example parsers with the main parsers Updated a few tests. Note: json tests have issues. Parse location differences. Investigate. Tracked the location differences to commit [dd2dfa9d07530d8d] where definition of lexeme start switched from byte-based LEX_START to character-based LEX_START_C. This commit is on branch [parse-events-rtc]. I suspect that the stepping of character locations is incorrect. Looking at the `step()` function I see the character increment done for the last byte of the character. This is ok for the singletons. For multi-byte characters OTOH it looks to be too late. We should step at the first byte of a character. Complicating things, partial characters, i.e. missing trailers, unexpected trailers. json is good to find these issues, its test suite contains many of these cases. Fix will be done on new branch. This commit is not responsible, can be merged. Closed-Leaf check-in: 26a81da805 user: aku tags: example-integration
2018-09-07
23:12
Integrated the language example tests into the main. Added the language example parsers to the `remeta` tool to ensure that they are updated after runtime API changes too, not just the system parsers. Dropped the replicated support files, redirected test suites to the main support as the only support. Note: The json example has issues, major testsuite fail. Bisect where this started and fix. - Looks to be lexeme location information differences between rtC and rtTcl. - Might be fixed in the parse-event branch. Leaf check-in: 67d5a8317b user: andreask tags: mistake, example-test-integration
19:05
Fixed handling of `args` parameter in wrong#args error messages. The text is dependent on core version. -- There is more going on -- Critcl differences, Core differences -- Try later again -- Closed-Leaf check-in: 7dbbfd79f5 user: andreask tags: mistake
04:42
The code generating symbol names for literals was still subject on the exact content of the [:control:] charclass, causing differences in output depending on the version of Tcl used (seen between 8.5.18 and 8.5.19). Now ensured version-independent quoting in all the relevant places. Fixed order of conditions too. Cross-referenced these places. Regenerated the internal parsers (slif, literals). Updated tests. check-in: e71840b49a user: aku tags: trunk
2018-09-06
19:29
Fix missed bogus `static const` in test results. Closed-Leaf check-in: 3b692749d7 user: andreask tags: tkt-d9b47bbb78-event-management
16:32
Ticket [d9b47bbb78]. Fix event management in the generators. C runtime and generator updated for the new data structures. Tests updated. check-in: e2021e3ef5 user: aku tags: tkt-d9b47bbb78-event-management
2018-09-04
19:53
Ticket [d9b47bbb78]. Fix event management in the generators. Tcl runtime and generator updated for the new data structures. Tests updated. C runtime tests broken. Next: C runtime and generator. check-in: 69f8e6628c user: aku tags: tkt-d9b47bbb78-event-management
2018-08-31
23:12
Ticket [d9b47bbb78]. Fix event management in the generators. Extended to fix the event data structures in general. - Modified container, grammar base, grammar classes - Event information split into declaration with state (events), and usage information (triggers). Events are managed in the main container class. Triggers are managed in the grammar base class. Code is common to L0 and G1. API used by the semantics is unchanged. Serialization structure changes. Event section is global. Trigger sections under L0, G1. Used a purpose-written tool to rewrite the old serializations in tests results (gcstate, gcr_*) to the new form. Reworked the SLIF generator to use the new form of serialization. Tests pass, no results changed. Fixed testsuite oops (generatec_gc(c|tcl)) not seen in trunk because everything with gc also had gc_c, etc. GC generators are ok as well. Extended testsuite to cover grammars with events. check-in: 5b0bdf9591 user: andreask tags: tkt-d9b47bbb78-event-management
19:46
Ticket [d9b47bbb78]. Fix event management in the generators. rt-C generator ... scrap ... Working on this makes me realize that my data structures are still not right. Re-reading the documentation at https://metacpan.org/pod/distribution/Marpa-R2/pod/Event.pod I came back with the impression that the user-facing event names are indeed in a single scope. It is not spelled out explicitly, but I can't see it work if not. Especially with runtime (de)activation of events it must be. Which means that having separate structures for G1 and L0 is wrong. The core event information (name -> activation) status must be shared. The trigger tables can be separate, should be even. Saving this for posterity, and restarting (from the container outward) Closed-Leaf check-in: 5bf97382b1 user: andreask
19:36
Ticket [d9b47bbb78]. Fix event management in the generators. rt-Tcl generator updated. Tests updated. Split the more complex internal formatting commands into more composable pieces, and re-composed them. check-in: 83c23635b6 user: andreask
07:39
Ticket [d9b47bbb78]. Fix event management in the generators. Tests fully updated. Started update of generators. - slif ok - tparse in the works check-in: d4619268cc user: aku
2018-08-30
23:32
Ticket [d9b47bbb78]. Fix event management in the generators. Changed container data structures. Separated event definition from event usage by symbols (triggers). Attention: This changes the grammar serialization. Started update of tests. TODO: Update generators. check-in: 8c77fd896a user: andreask
21:21
Ticket [d9b47bbb78]. Fix event management in the generators. Prep work, a test. As part of the expected results this also contains an implied draft spec for various changed data structures (engine, slif container), and semantic processing. check-in: 4cd32b9b44 user: andreask tags: mistake
06:03
Merge latest to doctools work check-in: 783110305c user: aku tags: language-doctools
06:02
Merge latest to event work check-in: 783f0feb8b user: aku tags: parse-events
06:01
Merge work back to main. check-in: cf5be67f4a user: aku tags: trunk
06:00
Completed the pass. Closed-Leaf check-in: ef8ad60e6d user: aku tags: test-cleanup
00:07
Start some cleanup of testsuite, some simplifications for var handling of the loop helpers ... check-in: 4f1c1e6f02 user: andreask tags: test-cleanup
2018-08-29
06:59
Updated main line with continued event work. check-in: 36533140c7 user: aku tags: trunk
06:56
Updated example check-in: c5b2707fc3 user: aku tags: language-doctools
06:54
Completed work on the ticket, integrated with base. check-in: 16d33a63ac user: aku tags: parse-events
06:53
Added test for parser reset. Closed-Leaf check-in: 31becc141c user: aku tags: tkt-a5abdaacd4-shift-test-reset
2018-08-28
23:57
Document i-gen a bit more. Additional helpers for one-shot grammar tests outside of the big loops. check-in: d8bfe7d142 user: andreask tags: tkt-a5abdaacd4-shift-test-reset
22:37
Ticket [a5abdaacd4]. Shift engine reset to start of parsing. Keep state for introspection (statistics, progress reports, etc.) post parse. Needs testcases, before/after introduction of reset. check-in: 230aebacc3 user: andreask tags: tkt-a5abdaacd4-shift-test-reset
05:33
Get trunk and doc work into current example. check-in: 6563481eae user: aku tags: language-doctools
05:32
Pull doc work into official code. check-in: 1e2d209efa user: aku tags: trunk
04:40
Final grammar fixes based on processing the Tcllib manpages. 100% pass (429 pages. Note, three down in the total, had include files caught in the simple find) - Allow backslash in general text (and examples). - Allow continuation lines in markup commands. - Allow whitespace after the last element of `tclwords` sequences. - Fixed oops in the include file handling, failed to process the last character of the included file. This was ok when it was a closing newline. Bad for files without such, i.e. ending in the closing bracket of a command. - Allow section command with two arguments. Same as subsection. - Allow 'args' as alias for 'arguments' (list type). - Tweak character escapes, recognize double-backslash. - Allow empty list elements (No text between element markers) - Paragraph sequences are improper (allow tailing separator) Fixed ca 20 tcllib manpages which had issues the older doctools system is not catching (para before first list item, bad closing brackets, silent swallowing of bogus markup commands, ...). check-in: 4b364c62cf user: aku tags: language-doctools
2018-08-26
04:52
Added documentation for package `marpa::util`. Timing/Sequencing missing. Regenerated embedded docs. Leaf check-in: d8022c427d user: aku tags: more-documentation
2018-08-25
20:39
Completed documentation for `marpa::unicode. Regenerated embedded docs to pull in this and previous changes. Completed ticket [a82dc50647]. check-in: 2f04a2d613 user: aku tags: more-documentation
05:05
More filling in of the manpage for `marpa::unicode`. check-in: ac96c4984d user: aku tags: more-documentation
2018-08-24
23:57
Started to document marpa::unicode package check-in: d590007f37 user: andreask tags: more-documentation
21:29
Doc work: Placed project name, titles and uris into variables for shorter references, and easier changes in the future. check-in: d390e27e1e user: andreask tags: trunk
06:26
More grammar fixes based on processing the Tcllib manpages. Implemented a lexeme barrier (lexer flush) for use in stop event handlers. Forces the lexeme sequence for the included file or similar to coincide with the end of that secondary input, and everything going up to the border flushed, so that it will be crossed without any rewind after the last lexeme. Prevents rewinding in the outer input stream after returning from the include, causing very bogus lexing. Note, this only a lexer flush, without a parser eof. Updated doctools-base to use the new barrier in its include management. Grammar changes/fixes: - Dropped the special forms fully from the G1. - Configured the special forms for LTM, i.e. to always match. - Allow use of 'enum' as a list type, vs. just the official 'enumerated'. - Allow opening brace in example text, and quoted strings. - Fixed `subsection` syntax, may have two arguments. This fixed an issue with seeing include or vset in places not expected by the G1 from before the change. Now they can appear everywhere between lexemes. Fixed rt-Tcl's handling of LTM (= always-on) lexemes. Bad id conversion in a few places. Statistics: - 355 pass 82% - 77 fail 18% - 432 total 100% check-in: 21eb6e74f0 user: aku tags: language-doctools
2018-08-23
07:07
More grammar fixes based on processing the Tcllib manpages. Fixed `braced` vs `simple`. Fixed `nbsimplex` vs `simple`. (G1 ambiguities due to L0 lexeme language overlap) Fixed use of `subsections` in pseudo-section `description`. Allowed. Eight combinations for paragraphs, subsections, and sections. Added missing `manpage` markup command. Fixed handling of empty include files. Shortcircuit entry and return. Statistics: - 299 pass 69% - 133 fail 31% - 432 total 100% check-in: 417bd724ce user: aku tags: language-doctools
00:00
Trialing real-work manpages from Tcllib. Fixed issue in grammar around '\[' escapes. Removed `\` from ordinary set, treat separate, with \[, \]. Extended the `dt` helper to operate on multiple files, reference files. Added options `--help`, `-o`, and `-t` to `dt`. Extended `doctools::base` with a limited debug stream to show include processing (files, file size, returns). Tweaks in rt-Tcl parser's narrative tracing. Ditto in rt-Tcl lexer. Fixed ambiguity `simple` vs `simplex`. Because the latter was not strictly longer than the former all g_text's with a single simple were also simplex, for 2^n blowup. Making `simplex` strictly longer (i.e. at least 2 simple in the series) this is gone. TODO: `simple` vs `braced` ambiguity where `simple` accepts `{...}` while it should not. That is a simple (sic!) grammar bug. check-in: a9c57a9594 user: andreask tags: language-doctools
2018-08-22
05:08
Added doctools helper application (`dt`) to show AST for arbitrary doctools input. Added demo tool. Refers to bogus example at the moment. check-in: 39ce9e09a5 user: aku tags: language-doctools
05:07
Fix use of `mymethod`, use direct `my`. check-in: a4eb59f54f user: aku tags: language-doctools
05:01
Fix path resolution for includes. Fully normalize all base paths. check-in: b2f700d6b7 user: aku tags: language-doctools
05:00
Fix missing requirements check-in: e708c6d8da user: aku tags: language-doctools
04:59
Grammar fixes - Allow `vset` (reference) at word level, not just as markup. (Should trial if `latm 0` would work for the special forms) - Kill multiple levels of quantification in various L0 symbols. This is just ambiguity where a series of N elements generates `partition(N)` (*) trees for the same lexeme. TODO: Write semantic checks for a SLIF container to detect and warn about this. (*) https://en.wikipedia.org/wiki/Partition_(number_theory) https://oeis.org/A000041 Very much beyond exponential. check-in: 32457c306a user: aku tags: language-doctools
04:51
C code generator tweaks and fixes: - Fix the issue with the bad indentation of per-symbol semantic information. New array configuration flag: `align`. Default true. - Change lexeme map for events to proper tabular. New array configuration flag: `padright`. Default false (i.e. pad left). check-in: b719ce15a0 user: aku tags: language-doctools
2018-08-21
23:51
Brought part of the `mindt` example, and associated experiences into the doctools example. Reworked the doctools grammar to properly handle the special forms. Further reworked to give it a structure similar to the `mindt` grammar. Started updating the tests. Oops! There is strong ambiguity in the innards of `tclword`, the G1 level sub-structuring of command arguments. It happens because at G1 level we have invisible whitespace discards, and so a text like `BBB CCC` can be parsed as two separate `unquoted` arguments (good), or as a __single__ `unquoted` argument of two elements (bad) because the spacing separating them went away at L0 level. Something similar happens in the definition for `quoted`. `braced` is fine because it maps to a single unbroken lexeme. This needs more thought ... Currently I only see bringing the argument spacing into the G1 as possible solution. The lexeme match would have precedence over the discards. Oh, and `quoted` already has `space` as element of the structure, so maybe it would not have the issue. That would be similar to the special forms, they handle their inner whitespace explicitly too, although in L0. ... A quick trial looks to be working ... Only forest remaining looks to be `breaker` vs `space` in example blocks. Fixed the ambiguity as well. Introduced non-breaking (nb) space, allow only that in the example text. Use non-breaking simplex variants as well to gather longer chunks of input. This looks good. Tests all ok, and ASTs look good enough. I may introduce the special action `::evens` however to drop unwanted spaces in the `tclwords` sub-tree. Explicit removal of the separators. Tests/examples for variable handling and includes have priority tough. Also, get larger/real-world examples (Tcllib, Critcl, Kettle, Kinetcl, etc). check-in: ea5192243e user: andreask tags: language-doctools
17:50
Brought information from mini-dt and the new features for IO and parse events into the main doctools example. check-in: 8a648972d9 user: andreask tags: language-doctools
17:37
Integrated multi-stop support as new feature into parse-events. Mini doctools examples demonstrates usefulness (and worked out how to handle vset/include for doctools ;). check-in: 27f7f4624d user: andreask tags: parse-events
17:32
Integrated mini-dt into the branch it came from, multi-stop handling. Closed-Leaf check-in: 4a80913963 user: andreask tags: multi-stop
17:29
Tweaked comments in the mindt engines a bit. Reworked the class structure to better hide the internal composition with event handling and multiple parsers from the tests (and users). Updated test setup code. Added notes documenting the class structure. Tests themselves pass without change. The IO changes for from/to look to be a success. Removed the scratch IO logs from then the problems with overrun were detected. Closed-Leaf check-in: 8183573f2f user: andreask tags: mini-dt
07:54
Fixed the higher layers, adapted to the changes in inbound. Updated tests. Still have to document the higher match location methods. check-in: 5f55575b22 user: aku tags: mini-dt
2018-08-20
23:16
Reworked API of low-level inbound, documented start/stop location, and limit meanings. Updated tests. TODO: Update higher layers. check-in: 872ecc83c7 user: andreask tags: mini-dt
2018-08-17
23:32
Tentative change to inbound for `stop after a character`. With controlling condition at a place in the loop where it can `stop before the first character` also, giving the full range. Condering to tweak the user-visible api to still mean `stop before character`, as a shifted form of the above, so that all user-specified locations are >= 0, never negative. This work will move to its own branch off `parse-events`, to work it out completely, including internal docs before continuing on the (min)doctools. check-in: e66e52ad3c user: andreask tags: mini-dt
22:09
Scratch - Added narrative tracing to inbound around the IO loop, track exactly where it is and how it moves. - X1, X2 - Demo of the `-1` difference for the include-iii test processing. It seems the issue is rooted in the fact that I stop the engine before the marked character is processed. Trying to set the stop marker after the last character of the stream, to stop after everything has been processed ... does not work. The possible fix would be to change the semantics and stop after the marked character has processed. Then setting the marker on the last character will stop after it, before trying to move into the bad lands. Techically this shifts the impossible stop location to before the first character of the input. OTOH, if we choose to allow an absolute position -1, that might work (shift the controlling test to the beginning of the io loop). The inband `no stop`-signal would then have to be `-2`. check-in: 06e8e23383 user: andreask tags: mini-dt
2018-08-16
22:37
Oh! :facepalm: `inbound_enter` copies just the pointer to the input string into the engine state. This pointer is (as critcl type `pstring`) just a pointer into the Tcl_Obj internals (GetStringFromObj). We are fine as long as we do not perform `inbound_enter_more`, because nothing does anything with the pointer, but reading from it. Not even basic events. But with `inbound_enter_more` we realloc that area. Means that the old space will be freed when the system did not find enough space to expand in place. At which point we pulled the rug out of under the Tcl_Obj* the string came from. Doing anything with that Tcl_Obj* (like releasing it, example `unset var`) will then smash that memory. Fixed, `inbound` now tracks if it owns IN.bytes and acts accordingly when expanding the area in `enter_more`. TODO: Validate stop locations, must not allow anything beyond the end of the allocated memory, system will try to run over into uninitialized cells at the moment. TODO: Alt: Force engine stop on max for total as measure against the above. check-in: 57ccd4001d user: andreask tags: mini-dt
19:58
Completed processing of includes. Completed tests. rtC crash with includes. check-in: e21f8235e9 user: andreask tags: mini-dt
08:07
Completed rtC state reset (all stages!) Fixes the aforementioned weird issue in the rtC/special forms, skips first character. Bogus char index information. Leftover test failures are the incomplete handling of includes. check-in: b889957b9c user: aku tags: mini-dt
07:37
Integrated parser for special forms into the main parser. vset processing done. include processing still to be done. Updated tests. Weird issue in the rtC/special forms, skips first character ? check-in: 79362316fe user: aku tags: mini-dt
07:34
Fix level error in intercept helper check-in: 85d64bf32c user: aku tags: mini-dt
07:33
Extended rtC to properly report errors from event callbacks. Extended both runtimes to reset the system after a parse ends. This allows multiple calls to `process(-file)` for a single parser. Each call is a new run. check-in: 0cf5429d76 user: aku tags: mini-dt
2018-08-15
23:53
Created grammar for the special forms. Testsuite for same. check-in: de9583426c user: andreask tags: mini-dt
19:27
Bring fix into main doctools branch check-in: 35a9eb534a user: andreask tags: language-doctools
17:10
Cherrypicked semvalue processing fix (allow for null) into trunk. check-in: d761e9ed2d user: andreask tags: trunk
07:09
Scratch work on processing the forms. Remove, replace with a small parser to get the internal structure. check-in: 8491fab778 user: aku tags: mini-dt
06:38
Handle possibility of the parser providing a null semvalue. check-in: 0810d9bbd8 user: aku tags: mini-dt
06:37
Fix forwarding in multi-stop check-in: 253f051635 user: aku tags: mini-dt
02:41
mini-dt testing continued. Fixed quote handling in G1 of unquoted tclwords, was missing. Base parser testing passes. check-in: 3d697cb567 user: aku tags: mini-dt
2018-08-14
23:49
Filled out test framework Started adapter class to hold the parse event handling First tests Fixes to multi-stop Issue with on-event and callback setup for Tcl runtime. Oops. Since rt::base this was not initalized. Have to repro on main/event branch for proper test. check-in: efbdda6404 user: andreask tags: mini-dt
20:59
Completed the grammar. Added examples for tests to come. Generated the base parsers. Believe to have a viable idea about processing the special forms allowing us to get by with just L0 post-lexeme events. Main processing would be done by Tcl `subst`, plus a safe interpreter to limit the set of accessible commands. check-in: 49a999848b user: andreask tags: mini-dt
17:48
Name churn in directory organization of the new example. check-in: d12f7b2202 user: andreask tags: mini-dt
07:20
Fill in of the mini-grammar. Lots of the Tcl syntax still needed. check-in: 0fc17efcf2 user: aku tags: mini-dt
2018-08-13
23:57
mini-dt experiment start check-in: 467965138f user: andreask tags: mini-dt
22:49
Draft of adapter to provide management of multiple (stop) markers to any engine. No tests yet. Extended runtimes to remember and deliver the last location of the primary input stream. Moving to that location (`match from`) will force the engine to abort processing. As we have only a byte length at the beginning we needed an additional function for the char index to map byte offsets into char offsets. Some more in-band signaling to defer the conversion of the byte length to char length until the information is actually requested for the first time. This ensures that parsers not doing IO shenanigans will not have to scan the entire input twice, for the last char, and then for actual processing. Only parsers which do will have to expend that time, and only as much as it not yet processed. check-in: 85cce8e95a user: andreask tags: multi-stop
07:07
1. Clean up of stage `inbound`, and preparation for item 2. Removed `eof` method and function from `inbound`, and calls from the overall runtime. `Inbound` itself triggers all the necessary eof processing in the pipeline stages following it, since parse event support was added and its main loop restructured. Reworked the EOF trigger condition in `inbound` to trigger on exactly reaching the end of the primary input stream, instead of anywhere at the end and beyond. This is preparation for the upcoming support for extending the physical input stream with secondary input. Reworked the nested double loop back into a single loop by strategically hoisting the operations done by the outer loop into the relevant conditionals of the inner, making outer superfluous. 2. Added support for adding and use of secondary input streams. Placed in the physical input stream after the primary input it can only be used by explicitly moving the cursor into its span/range. New user-visible methods: `extend`, and `extend-file`. Internal methods and functions (inbound): `enter-more`, `read-more`. 3. Extended and updated test suite First to cover the accumulated public methods accessing and manipulating the current location cursor. Argument syntax, sequencing. One larger movement example demonstrating use of the secondary input (stream extension). Second to cover the new methods for adding secondary input. All tests pass. check-in: 687652d3d6 user: aku tags: parse-events
2018-08-08
22:21
Test fixes. check-in: f301d37ca1 user: aku tags: language-doctools
22:11
Regenerated the doctools parsers, adapt to the runtime changes made for event support. check-in: 983593b1c8 user: aku tags: language-doctools
21:21
Brought the main enhanced with support for IO/L0 parse events into the language example which needs it (doctools: vset, include). check-in: 15eeacc70a user: aku tags: language-doctools
21:17
Pulled current working state of parse events (IO and L0 events) into main. Keeping the branch for future work on the G1 level event support. check-in: 2b796f8c7a user: aku tags: trunk
21:10
Pulled completed heredoc language example into the main parse-event work branch it was based on. check-in: 19009459df user: aku tags: parse-events
21:08
Completed the heredoc example - Tweaked grammar, regenerated parsers - Added narrative tracing to the wrapper completing the event handling. - Completed the testsuite, code and cases. Closed-Leaf check-in: 01cdb8aed0 user: aku tags: lang-heredoc
2018-08-04
03:51
Get fix check-in: 55fc94ae0a user: aku tags: lang-heredoc
03:49
Pull fix into main parse event branch. check-in: 226236cb49 user: aku tags: parse-events
2018-08-03
19:51
Fix event-list problem when using namespaced parser. Closed-Leaf check-in: cf55185633 user: andreask tags: fix-event-list
19:22
Pulled streamlined match API and updated to it. Filled in the helper/demo/example app. check-in: 9c438ff29f user: andreask tags: lang-heredoc
07:19
Continued match API work, fixed dependent files (tests). Removed `rewind`, `values`, `start:`, `length:`, `value:`, `values`, `symbols:`, and `sv:` Renamed `relative` to `from+`. Closed-Leaf check-in: cf1bc408d8 user: aku tags: match-api-streamline
2018-08-02
23:29
Rework the `match` API, streamline, simplify check-in: 081f8f2a56 user: andreask tags: match-api-streamline
2018-08-01
23:34
Started on heredoc demo. check-in: 920296d8ea user: aku tags: lang-heredoc
05:58
Merged the completed work on support for stop events, closed branch. check-in: 10238f62f5 user: aku tags: parse-events
05:10
Full test run, went over failed tests, fixed issues introduced in previous commits. (1) Unbreak engine_debug: progress-reports. Requires recognizer named RECCE. Undone rename to LRECCE, PRECCE in lexer, parser respectively. (2) Fixed bad references in the unconditional support for events. (..._event_list still conditional, name must be C identifier) (3) lexer.test updated to changed constructor signature. (4) Updated results expected by clex/cparse generator tests. (5) Extended parser tests to check the new process(-file) syntax (from, to, limit configuration). Regenerated the builtin SLIF and literal parsers. Closed-Leaf check-in: 8b31181319 user: aku tags: stop-events
2018-07-31
07:19
More rtC/Tcl equalization. check-in: e8fcf4e47f user: aku tags: stop-events
07:06
Trimmed whitespace. check-in: 2fa1c65bbd user: aku tags: stop-events
07:05
rtc and rt-Tcl nearly matching again as per the tests, however the tests have known holes. TODO: Extend test coverage (parser/lexer constructor options). Generated parsers/lexers now unconditionally support events. TODO: A number of current separate structures can be consolidated. TODO: Name churn, and rename `sem_tcl`. check-in: 8b36e7d0bb user: aku tags: stop-events
2018-07-27
23:30
Scratch commit. Compiles. Tests fail. Begun implementation of initial from/to/limit for `process(-file)`. Required to test stop events (force event from outside). Also a feature of the original. Fixed forgotten use of `location?` in the inbound tests. Need more inbound tests for all the new methods for cursor movement. Bug: The ability to force a stop event on the engine from the outside implies that we have to support events even for grammars which do not declare any. The templates for rtC-based parser/lexers must change accordingly. We might be able to optimize them down to `support only stop`. check-in: 3d17118a55 user: andreask tags: stop-events
21:43
Tweaked copyright info. Churn. check-in: 7ebed1837c user: andreask tags: stop-events
20:56
Added more narrative tracing. WTH ? This fixed rtc `sv:` ?! This needs more investigation. check-in: 6383f3328f user: andreask tags: stop-events
20:15
Replaced posint and posint0 definitions with aliases to new critcl argument types. (See critcl tag `scalar-basic-limits`). check-in: 0a54dfb149 user: andreask tags: stop-events
07:10
Implemented facade access based on event type. Updated tests. Name churn for the match methods match moveby -> match relative match moveto -> match from match stop-at -> match to match stop? -> match stop New method `match dont-stop` to cancel a stop marker/event. Note: rt-Tcl and rtc divergent wrt `sv:`. check-in: 32e9e5d575 user: aku tags: stop-events
2018-07-26
23:46
Fixed the parse issues. Draft snapshots for all match tests. Implement the per-event access control next. check-in: 460323c2ae user: andreask tags: stop-events
22:53
Continued work on the facade tests. Syntax and coarse check ok. Per-event check still has troubles. All rtc. Triggered the assert for the move bug reported with commit [2450c40017]. Fixed the issue, reorganized `inbound` and `clindex` for this. check-in: 0ade3d5008 user: andreask tags: stop-events
07:55
Continued work on stop events and facade validation. New `stop?` query method. Still need a method to clear stop marker. Basic acess rejection for rtc (error when outside event handler). check-in: defc583e62 user: aku tags: stop-events
2018-07-25
23:58
Started on match tests. Core operation. check-in: 9625d9a9bb user: andreask tags: stop-events
19:36
Assert a bug to fix check-in: 2450c40017 user: andreask tags: stop-events
19:36
Fix warning check-in: 5fc1c7b8e6 user: andreask tags: stop-events
19:36
Added in some argument validation. check-in: bb08f525de user: andreask tags: stop-events
18:49
Copyright updates. Added facade docs missing from previous commit. check-in: e26673daac user: andreask tags: stop-events
18:41
Continued work on stop events. Completing rt-Tcl, implementing rtc. Compiles, however not tested. Do not have any tests for stop yet. No tests available to check facade access control either. Draft facade argument types and access control now documented. check-in: 892e53a219 user: andreask tags: stop-events
15:31
Started implementation of stop events, and match state access validation. check-in: 1adbab056d user: aku tags: stop-events
04:02
Tweaked the match API, specifically location reporting and manipulation. The user visible current location always points to the character where (with which) processing will resume when the event handler returns. Updated the tests. check-in: 11a3929e08 user: aku tags: parse-events
2018-07-24
03:54
Merged the work on PE rtc support into the general PE branch. Nearly at the point where we can do heredocs. For that we still need a stack of input segments for more control over movement in the string. check-in: 8ff5c985cb user: aku tags: parse-events
03:49
Updated testcases for the changed generator core. Closed-Leaf check-in: 6c51252fb9 user: aku tags: parse-events-rtc
03:25
Extended rtc with implementation of the match method `alternate`. rtc lexer parse events complete. Lexer parse events complete (support in both runtimes). TODO: Create example (heredocs) demonstrating use. check-in: 265676058d user: aku tags: parse-events-rtc
03:22
Extended narrative tracing. check-in: de55050b65 user: aku tags: parse-events-rtc
2018-07-23
21:05
Extended event testing to match method `alternate`. check-in: b2a6a0ad23 user: aku tags: parse-events-rtc
19:54
Extended rtc with implementation of the sv/symbols match methods (set and get). Extended the generator to provide the necessary structures for the conversion of symbols to internal ids. Added a number of comments around the inspection of the rule bytecode for error information. check-in: 193d2ab03f user: aku tags: parse-events-rtc
19:40
Extended event testing to the sv/symbols match methods (set & get). check-in: bc770d489f user: aku tags: parse-events-rtc
2018-07-19
23:47
Name churn check-in: a2f348ee40 user: aku tags: parse-events-rtc
20:28
Fix trailing whitespace in generated files, due to the trimming of the templates. check-in: e5e54951cc user: aku tags: parse-events-rtc
19:40
Overdue trimming of trailing whitespace across the board. check-in: ee9b3c40b5 user: aku tags: parse-events-rtc
19:39
Ignore a few more work things in the src area. check-in: dea206a3fa user: aku tags: parse-events-rtc
07:08
Created input indexing per doc.1/INDEX_3.md. Integrated indexing with inbound. Demo tests now pass. Still to do: Passing in alternate symbols and sem values. check-in: 3d4c0b8e3a user: aku tags: parse-events-rtc
2018-07-18
20:49
Demonstrate location mis-tracking by rtC when moving in input with multi-byte characters from a parse event handler (non-ascii-b). Show general ok tracking without moves by the user (non-ascii-a). check-in: 0f203e8f47 user: aku tags: parse-events-rtc
18:59
Continued fill-in for before/after events. `Inbound`, `gate`, and `lexer` rewritten to match rt-Tcl behaviour with regard to stepping through the input, rewind, and flush. __Attention__ `Inbound` location tracking handles only ASCII correctly, still, not multi-byte UTF. Still mulling over possible index structures to enable quick movement in the presence of multi-bytes with low memory overhead for very uniform input. Changed the `lexer` field `m_sv` to an int stack, the component tracks ids, with the associated SVs already remembered in the `store`. Forced changes to the implementation of the facade, and to the API between `lexer` and `parser` (changed signature and implementation of `marpatcl_rtc_parser_enter`). Fixed forgotten difference between discard and other events (ACS symbols vs G1 symbols) in the C generator code and the id/symbol conversion done by the facade. check-in: dc070a04f8 user: aku tags: parse-events-rtc
05:05
Fix comment typo check-in: 2a46239e90 user: aku tags: parse-events-rtc
2018-07-17
16:51
Started on before/after events. Will not compile, see all the XXX markers, just a checkpoint. check-in: 91ad806001 user: aku tags: parse-events-rtc
05:29
Filled in most of the facade. Still missing parts relevant to before/after events. Location accessors implemented, only correct for ASCII. UTF support is still to do. Moved the API functions into inbound and lexer, with pieces in the pedesc class. Dropped the separate pedesc header and C sources. Optimized moveto, add the delta to the positions before delivery to the engine. Lexer now manages all the new fields for the match state (initialization, update, reset). lexeme data for SV now coming out of the new fields and accessors. Event testing now recording any error we may receive from the match facade. check-in: dd2dfa9d07 user: aku tags: parse-events-rtc
2018-07-13
23:47
Attention: This commit will likely not even compile. It is saved scratch state of work on the rtc lexer match state to complete the parse event facade. Reworking the lexer state internals for cached access to the information, ability to modify. This will affect the lexer/parser interface, namely the transmission of found symbols and semantic values. check-in: 96dc913d9e user: andreask tags: parse-events-rtc
22:27
Added tracing of the new event matching and reporting code. Tweaked tracing of the EH functions (separate stream). Added detection and reporting of discard events. Fixed generated lexer event entries, engine operates with the ACS symbols. Updated tests. Discard events are detected and reported. Testsuite fails because the PE descriptor facade is not completely filled out yet, and the incomplete methods have divergent signatures from the expected. While the Tcl errors thrown by the event recorder callback used in the testsuite are ignored during execution, they are properly seen in the narrative tracing proving that the discard callbacks works. Fill out structure and facade before implementing the before/after callbacks. check-in: 91aa29f8f2 user: andreask tags: parse-events-rtc
18:06
Linked the PE descriptor facade into the lexer/parser templates. Removed scratch notes from the facade. Updated tests. check-in: c20d03a3ef user: andreask tags: parse-events-rtc
2018-07-12
21:35
Extended the facade with set/get for the class rtc variable needed by the constructor check-in: 9e2ca3d617 user: andreask tags: parse-events-rtc
20:58
Started work on generic parse event descriptor access. Using a critcl::class as facade to the structures, ensemblified methods. Requires critcl::class 1.1 to disable tcl-api, generate c-api. check-in: 84fa7104f2 user: andreask tags: parse-events-rtc
20:01
Fixed typo in comment check-in: 915f151c40 user: andreask tags: parse-events-rtc
05:43
Fixed [f5e6063aeb] memory smash. Miscounted references to the `self` argument of parse event callbacks. Updated tests with all the generator changes. rtc-runtime tests: L0 parse event cases failing as expected, as event generation is not done yet. However we are now at the point where we can start on adding this in. Most of the foundations are now present. Notably still missing are the parse event descriptor structures and its linkage to the lexer state. check-in: 66339c7f20 user: aku tags: parse-events-rtc
2018-07-11
22:52
Filled in the marpatcl_rtc_eh_... functions and structures. Fixed bogus declaration syntax for the generated event structures. __Attention__: Seg.fault in core. Likely caused by the new functions, structures, and their use. Update: Fixed with [66339c7f20]. check-in: f5e6063aeb user: andreask tags: parse-events-rtc
20:28
Follow up to lex-only refactoring, updated tests. check-in: 64a75ee092 user: andreask tags: parse-events-rtc
19:58
Ripped the general structures and code for lex-only token/value handling out of the clex template and placed them into the marpa::runtime::c package for sharing. As part of that the result and event callbacks from RTC now have separate client data information. Note, the yet undefined `eh` structures and functions already referenced by the event handling code are a similar thing for that aspect. check-in: fce9c19274 user: andreask tags: parse-events-rtc
19:47
Always initialize variable check-in: f3f39059c2 user: andreask tags: parse-events-rtc
19:46
Added notes about uninitialized memory to the two set implementations. check-in: 8202ba0482 user: andreask tags: parse-events-rtc
19:43
Memory smash fixed. SV ref miscount in the lexer in lex-only mode when one SV is re-used for multiple tokens in the same location. Win for the (SEM_)REF_DEBUG functionality coming out of the mem-limit, memory-cleanup branches. Valgrind was no help. check-in: 1d1b5cb7d8 user: andreask tags: parse-events-rtc
04:36
Fixed missing cleanup of callback field, caused a bogus 2nd destroy on object destruction. Tweaked comment in test support code. check-in: afeb1c4ea6 user: aku tags: parse-events-rtc
00:07
Filling in the C-level data structures and API changes implied by the template changes for clex and cparse generators. ATTENTION: Manually modified the builtin parsers (slif, literals) to match the changed structures and function signatures. ATTENTION: Memory smash somewhere in the zeta-rtc-lexer tests. (Fixed with commit [1d1b5cb7d8]) check-in: 3df1d4cea4 user: andreask tags: parse-events-rtc
2018-07-10
23:13
Show i-gen critcl command in the log. Added main tclsh include to the set of include paths to search. check-in: e274d2eda6 user: aku tags: parse-events-rtc
22:21
Converted the eof/enter callbacks of clex to critcl::callback. check-in: c230bc5be1 user: andreask tags: parse-events-rtc
21:13
Extended asset management to allow more than one asset. Generate C event datastructures, lex & parse. Updated and extended tests. check-in: c7e33a1f26 user: andreask tags: parse-events-rtc
2018-06-26
18:20
Make rewind more robust. Tweak post-event input movement. Added event scratch docs check-in: 22e19935f3 user: andreask tags: parse-events
2018-04-20
21:28
inbound: Absolute movement, tweaked for consistency. Extended with optional delta arguments. gate: Extended with forwards to the input location accessor and modifiers. This provides parse event handlers with the ability to move in the input. lexer: Moved redo to before we handle parse events. This enables parse event handlers to modify the location without interference from the system itself. Modified pre-lexeme event generation to move input to the start of the lexeme. Fixed pe-fill method, forgot to set flag for the incremental rebuild of the symbol/sv tables. parse descriptor: See gate. Further fixed accessor setup. Extended view to report input location. testsuite: Report input location. Move input location to the end of the lexeme. Needed now that pre-lexeme events have the location at the start. Updated expected results. check-in: faebf8fba4 user: aku tags: parse-events
04:33
Pulled memory fixes into the feature branch implementing parse events. check-in: 07ca2e2be0 user: aku tags: parse-events
2018-04-19
19:47
Update main line with the accumulated memory fixes. check-in: dcf340cd32 user: aku tags: trunk
19:44
Merged series of fixes for memory issues (memory leaks, management mismatches, ...). Together with the push of memory-intensive tests into child processes done here the testsuite should now be properly constrained again wrt memory usage. Updated tests. Closed-Leaf check-in: dc34e1afbb user: aku tags: mem-limit
18:18
Added test against unbound parser memory usage in parser-core. Found and fixed SV ref-count mismanagement (RCM) in the lex-only code path of the RTC. Added narrative tracing to the code path. Cleanup of lexer-core testsuite with regard to memory debugging. Found and fixed Tcl_Obj* RCM in the lex-only critcl template. Added narrative tracing to the template. Closed-Leaf check-in: a29c4613b0 user: aku tags: memory-cleanup
2018-04-18
23:41
Do not mix malloc and Tcl allocation routines. Even when the malloc is hidden, here in `strdup`. Used to allocate strings with strdup, then release with ckfree/Tcl_Free. This messed up memory mgmt internals to the point of memory smashes. Fixed, now using our own implementation of strdup based on the allocation macros from environment.h. This ensures that alloc and free match. Thank you, valgrind. check-in: 714d438603 user: aku tags: memory-cleanup
03:08
Moved the main parts of `test/support/gen.tcl` into `bin/i-gen`. This new internal generator application uses the public `marpa-gen` as the underlying workhorse. The support code now invokes the internal generator instead of doing everything itself. With the operation confined to a child process the amount of memory taken by the test process' is limited to the generated package itself instead of all the packages needed to perform the generation. Further modified lexer-core to place the sets of tests for each variant grammar/lexer-action into a child process as well. This ensures that the memory needed by each variant grammar is limited to that child process, instead of accumulating in the controlling test process. __Attention__: This change requires a Kettle with support for `kt::sub`, added to Kettle with commit [ef384673c5] (2018-04-18 02:28:17). check-in: ddc1e67640 user: aku tags: mem-limit
2018-04-17
20:36
Moved the main parts of `test/support/gen.tcl` into `bin/i-gen`. This new internal generator application uses the public `marpa-gen` as the underlying workhorse. The support code now invokes the internal generator instead of doing everything itself. With the operation confined to a child process the test process' amount of memory taken is limited to the generated package instead of also keeping all the overhead of generating Note, this does not help with the test suites based on lexer-core as that suite still loads/sources ten different lexer packages into the same process. Handling this requires more work, i.e. pushing the individual tests into their own child processes. check-in: ad4f1d4287 user: aku tags: mem-limit
2018-04-11
23:33
Extended the narrative tracing in `sem_tcl.c` to track refcounts. Found and fixed the cause for the orphaned Tcl_Obj*'s. The function `marpatcl_rtc_sv_astcl` did an extraneous refcount increment on the conversion result (SV tree to Tcl_Obj* tree). Parser operation based on RTC now does not leak anything anymore. check-in: 302f47227a user: aku tags: memory-cleanup
07:43
Added code (sem_debug.c) to track SV allocation and release, and dump orphan SVs at the end. All SVs were orphaned, nothing released. Tracked to a bad guard condition in function `marpatcl_rtc_sv_unref`. As the code checks the refcount before decrementing it both 0 and 1 must trigger destruction, not just 0. Fixed. All SV structures are now properly released at the end (parser instance destruction). Still orphaned things left, these however are Tcl_Obj's. check-in: 0d6e6c1dbe user: aku tags: memory-cleanup
2018-04-10
04:55
container: Fixed leakage of priority and priority alternate objects. Test support extended with various diagnostic tools. Note, it looks like the RTC and glue into Tcl are also leaking like mad, especially around the semantic values and the (partial) ASTs we are constructing. Continue investigation and fixing. Started a new branch for this, check-in: c3ff9015b6 user: aku tags: memory-cleanup
2018-04-06
06:20
inbound, gate, lexer: Added a back-link from gate to inbound, analogous to the gate/lexer and lexer/parser connections. Dropped history management from the gate and changed its redo method to simply rewind the input instead of re-entering the tail end of the history. This is the first use of the new cursor movement methods added to inbound. Under the old regime using foreach, and steadily marching forward in the input all re-processing was done by remembering and recursively re-entering characters as needed, with additional loops at the various stages of the pipeline (`gate`). With the while-based cursor on the other hand we have only one (nested) processing loop (`inbound`), and all movement is handled by it. We cannot recurse, and we can't have a loop in `gate`. The nested loop in `inbound` is required to handle the case where we reach eof and the later stages then tell us `not yet`, i.e. bounce us back. The inner loop is the main processor, and the outer loop restarts it until eof actually succeeds. In `gate` the flush-signalling changed. In 'lexer' eof signalling to the `parser` is prevented when it bounces the input away from eof. check-in: 065653b213 user: aku tags: parse-events
04:25
Pull gate readability changes into parse-event feature support. check-in: cc42a78587 user: aku tags: parse-events
2018-04-05
08:29
runtime-tcl, gate: Moved a few code blocks into their own methods to make their now-caller more readable. check-in: 0dff9cc32b user: aku tags: trunk
2018-04-04
17:57
inbound: Converted the `foreach`-loop processing the physical input stream into a `while`. The location information now is a cursor into the input, instead of a dependent variable. This allows us to move backwards in the input, or ahead, as we see fit. Added movement methods (absolute, relative) for the cursor. No stream expansion yet, nor virtual streams. check-in: 9c1fa89ac4 user: aku tags: parse-events
16:58
Added generation of pre- and post-lexeme events. Extended the testsuite to demonstrate them. Note, the pre-lexeme trigger location does not match Marpa::R2 yet. check-in: 7693f8601c user: aku tags: parse-events
06:44
Reworked the lexer somewhat. Moved match state into a nested object for easier access. Exposed to parse event handlers via a limiting facade. Outside entrypoint is parser method 'match', an ensemble. Moved to single event handler call bundling all relevant events. Internally also used to simplify GetSemanticValue. Updated discard event test. check-in: 97d4a5397c user: aku tags: parse-events
2018-04-03
20:42
Pull recent fixes into the language work check-in: da745b6e55 user: aku tags: parse-events
20:37
Fixed bug in the semantics' handling of :lexeme. Do not exclude the symbol from LATM fixup if the :lexeme adverbs do _not specify_ latm information. Facepalm. Found playing with lexeme events, and GC missing the crucial latm information, breaking generated test parsers. Updated tests to correct results. check-in: e5e442db2a user: aku tags: trunk
17:47
Tests: Remove a leftover break from debugging something, wrongly committed. check-in: 7c7353ad24 user: andreask tags: trunk
04:42
Get latest docs to work with. check-in: 1c73889373 user: aku tags: parse-events
04:41
Extended the Tcl runtime with basic support for events via callback. Set/unset/query, forwarding from the inner objects. Definition and storage of event maps. Proper pre-processing of such maps for the lexer, not yet for the parser. Added generation of discard events. Extended the testsuite to demonstrate the basic infrastructure, and discard events. Removal of trailing spaces. check-in: 0f5c6931e2 user: aku tags: parse-events
2018-03-30
23:42
Docs: More small fixes Leaf check-in: d89aeafc1a user: aku tags: docs
23:38
Get doc fixes check-in: d41168399d user: aku tags: trunk
23:37
Docs: Typo fixes, phrasing fixes. check-in: 2a1acc9132 user: aku tags: docs
23:24
Make recent documentation work official. check-in: 607761c40d user: aku tags: trunk
23:23
Docs work - Moved architecture from intro to dev guide. - Made intro a dispatcher to other documents based on readers goals. - Added reference for marpagen. - Added placeholder for SLIF, referencing the upstream Marpa::R2 documentation. check-in: 9f10a242a4 user: aku tags: docs
2018-03-29
23:14
Docs: Completed addition of feedback sections. Added audience/target information sections. check-in: fdf9132f0d user: aku tags: docs
22:36
Docs: Added changes, license documents, libmarpa requisite for installer, factored welcome message, added feedback in parts check-in: 2db7ae6cd3 user: aku tags: docs
21:46
Updated docs work with trunk changes. check-in: 9ad599cfa5 user: aku tags: docs
05:34
Started implementation of parse events. rt-Tcl first. Implemented generation from container, with fixes to have access to the stored G1 events. Extended testsuite to show event information, and updated older tests. Some whitespace corrections (removal of trailing spaces). Some tracing tag fixes. check-in: f99351071e user: aku tags: parse-events
2018-03-27
19:58
Grammar edits: - Tweaked some of the formatting. - Main change: Redone the spine of the document structure. Moved the nullability around, enabling use of sequence rules. The price is paid by the paragraphs, having their separators in the AST (Cannot mask/hide the separator of a sequence from the AST). Explicit recursion for multiple separators in sequence however allows hiding that in a single separator. Still, AST nesting significantly reduced. Further doctoring is something for full custom command actions, or the semantics taking the AST. Regenerated parsers. Updated test suite to match. check-in: 1a2a8af59c user: aku tags: language-doctools
07:17
Grammar: - Edited to provide the keywords with proper lexeme symbols instead of the ugly automatic names. - Reworked command definitions to enforce space after a command lead-in. - Added lots of custom actions (::first), to simplify the returned AST. Regenerated parsers. Went over test suite again. Removed the fail cases from Tcllib, completely bogus in places against the more strict specification. Reworked the ok cases and added the first proper ok results. Still thinking about the main spine of man pages and how to express nicely. The current explicit recursive structure nests a bit deep. check-in: de0a504dfd user: aku tags: language-doctools
2018-03-26
21:35
Created parsers from draft. Made test suite functional. First runs, all results bad (different error messages on failure, different type of AST structure). check-in: d46d0a963a user: aku tags: language-doctools
21:08
Documentation, installation guide: Added instructions for setting up `libmarpa`. check-in: 254c53ee1f user: andreask tags: trunk
2018-03-25
05:15
Added untested draft grammar. check-in: ecdb604ac6 user: aku tags: language-doctools
02:53
Added test-vectors used by Tcllib. check-in: 7e961f36fa user: aku tags: language-doctools
2018-03-24
23:33
Pull alias support into the example. check-in: 26bb8496d5 user: aku tags: language-doctools
23:31
Activated alias support in main. check-in: d60e61a1ce user: aku tags: trunk
23:30
marpa::unicode - Added alias handling. Updated testsuite. Further: - Reworked the table generator, more separate phases, less intertwined operations. Split across several files now, with each a set of related commands to manage part of the data structures. - BMP/SMP are directly generated as aliases where possible - Fixed issue with long-form category names for aliases. Tcl has its own definition of `control` (cc+cf+co). - Updated boot parsers. Closed-Leaf check-in: 85df278cb8 user: aku tags: cc-aliases
04:23
marpa::unicode - Added alias handling. Updated testsuite. Further: - Tweaked generator output. - Fixed issues with missing :bmp/:smp information for some aliases. check-in: cb06be0e04 user: aku tags: cc-aliases
2018-03-23
23:59
Table generator: Convert `:bmp` and `:smp` CCs identical to their origin into aliases, reducing storage requirements. Output not usable anymore until the unicode layers gets extended to recognize and handle aliases. check-in: cc6426a1cc user: andreask tags: cc-aliases
23:47
Introduced char class aliases into the output of the table generator. check-in: 7771b3d5ad user: andreask tags: cc-aliases
05:31
Start on 2nd big example, doctools, of Tcllib check-in: ccb57223f3 user: aku tags: language-doctools
2018-03-22
03:48
Language example `JSON` is now official. check-in: c4414a4012 user: aku tags: trunk
03:46
Phase 1 documentation now official. check-in: 17146d5f3e user: aku tags: trunk
03:46
Intro and dev guide completed. Doc phase I ok. check-in: 0698f503c3 user: aku tags: docs
2018-03-21
22:12
Started on proper documentation. Basic guides, some placeholders, no package docs yet. check-in: 45c56e540d user: andreask tags: docs
06:55
The known bug was due to a subtle difference in the two parsers. The Tcl-based parser accepted standalone surrogates, the C-based one did not. This came down to a guard condition in the ASBR compiler, which excluded any surrogates found in the input CC from the result. A design bug, not an implementation bug. Removing the guard fixes the issues with the json parser. The first attempt at the fix, adding the standalone surrogates explicitly to the grammar run into the same guard, albeit in a different manner. The explicit range became an empty literal during reduction, got removed, and then the still-existing reference to it caused the generator backend to throw an error. Regenerated the C-based parser, and updated all test results to match the changes in the rule numeration. Closed-Leaf check-in: a72fb8f4d2 user: aku tags: language-json
06:54
Merged design fix to the json experiment. check-in: cc8068dba7 user: aku tags: language-json
06:53
Fixed a design issue. Surrogate handling. The low-level unicode layer used a guard to prevent the addition of the surrogate codepoints when creating an ASBR from a CC. Thus for any CC including one or more surrogates the resulting ASBR actually represented the CC minus surrogates. The thinking was sort of that we are working with characters at the high-level, and while we have an 1:1 mapping to codepoints for most, this is not true for the surrogates, each of which is only half of a character. And the reducer targeting the C runtime based on Tcl ensures that characters in SMP are properly converted into surrogate pairs. Working on the JSON parser now has driven home that there are situations where we want to accept standalone surrogates, at the highlevel, and also that the low-level removal was a bad idea as well. The latter because a negated char class handled by Tcl does accept the standalone surrogate code points, whereas the ASBR for C is mangled to reject them. The result is a very unwanted difference in the behaviour of what should be equivalent parsers. So, lots of writing for a very small change, code-wise, the removal of the guard mentioned in the 2nd paragraph. And undoing the removal of the surrogates as a named character class. Further, brought the bugfix from commit [546018b243] into the `unicode_ops.tcl` used by the table generator. Same issue, and forgotten when the initial fix was made and committed. Updated tests to match results due to rule renumeration and CC changes. check-in: 1695c17f13 user: aku tags: trunk
2018-03-20
19:13
Continued work on the json test suite. Processed all the i_* cases. A single known bug to investigate for rtC. check-in: 84467cbcf6 user: aku tags: language-json
05:57
The unicode work is good enough to solve the known issues with the json parser. Make it officially available to trunk. check-in: 81404d4d77 user: aku tags: trunk
05:55
Unicode work good enough to solve the known issues with the json parser. Integrated. check-in: b333341f7c user: aku tags: language-json
05:50
Known bugs all settled. Test results updated for the changes in the rtC counting (characters, not just bytes). Closed-Leaf check-in: 3bf1d3e8b3 user: aku tags: json-unify
05:47
Update the json/unicode mix branch with the latest fixes on unicode. check-in: 2c4895d98b user: aku tags: json-unify
05:46
Update unicode work with the latest fixes on trunk. Closed-Leaf check-in: 78ab6a1f68 user: aku tags: reunification
05:42
Some cleanup in the core testsuite. check-in: 65b182f33e user: aku tags: json-unify
05:40
Added textual decodings of the knownBug y_* inputs, for readability. check-in: ca443ec660 user: aku tags: json-unify
2018-03-13
17:38
Updated the json tests. The only tests directly affected by the merge where a number of expected-negatives where the error message did not match anymore, due to changed symbol ids coming out of the generator. Furthermore, the `process-file` changes reduced the set of known bugs, eliminating all expected-negatives from it (process-file sees the same data as process now,, making the error messages the same for the two methods). We still have issues with various expected-positive which still error. These are suspected to require changes to the json grammar. check-in: d4f0f54320 user: aku tags: json-unify
17:30
Bring the unicode work into the json experiment for eval. check-in: 3358e68801 user: aku tags: json-unify
15:01
Fix the `process-file` method of the rtC backends. `Tcl_Read` does not do encoding processing. Replaced with `Tcl_ReadChars` which does. Plus attendant changes to handle the different signature. Now `process-file` is equivalent to `process` in that it sees CESU-8, and MUTF-8. check-in: e43eff4a15 user: aku tags: reunification
2018-03-12
21:06
Updated the rtC backends (lexer, parser) to use `utf-8` as the encoding for `process-file`, and updated all places affected by this (test results, bootstrap and literal parser). check-in: 59665cfdcf user: aku tags: reunification
20:33
Updated the literal parser. The updated bootstrap parser was already pushed in the previous commit. Passes the entire testsuite. We are now mostly (*) ready to go back to branch `languages-json` and evaluate if the extended unicode support helps the json parser. (*) Have to fix the `process-file` method in RTC (encoding = utf-8` = Tcl's internal encoding, like for the `process` method). check-in: edd9f06929 user: aku tags: reunification
19:59
literal::parse - Add code to handle compat `try`. Oops. Updated the bootstrap parser as well, already. check-in: 8fc12e03e6 user: aku tags: reunification
19:58
runtime::tcl - Move code for compat `try` into main package entry check-in: d503ddfb50 user: aku tags: reunification
17:11
Merged completed literal work into the general unicode work. check-in: 47e7cce082 user: aku tags: reunification
16:34
Tweaked the API between reducer core and callbacks, to handle symbol creation for the custom tags. finalized the integration of the new reducers into the generator backends. Extended Tcl backend to refactor codepoint ranges used in ASSRs, like the rtC backend does for byte ranges. Updated tests. Mostly. See note below. Note: Boot parser has not been switched to the extended grammar yet (allowing for unicode references in the SMP). Closed-Leaf check-in: 7d263e069d user: aku tags: relit