Marpa

Timeline
Login

Timeline

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

71 check-ins using file languages/doctools/tests/sf/result/var-ref-007 or languages/min-dt/tests/sf/result/var-ref-007 version d38987489a

2018-09-22
00:01
Added larger doctools examples from Tcllib. Modified dt helper to show new statistics. Untested. Leaf check-in: 3dbe995b99 user: andreask tags: io-statistics
2018-09-21
22:26
Completed IO statistics for rt-C. Fixed name clash in the new code (existing type vs new function, function renamed). Updated tests to match generator changes (cparse, clex). check-in: aa1749989d user: andreask tags: io-statistics
00:00
Added state and methods to collect and return IO statistics (#streams, #processed, #input). Implemented for rt-Tcl. Added tests. rt-C placeholders. check-in: 23a1ea88eb user: andreask tags: io-statistics
2018-09-20
01:54
Merge staging into main stream. Basic parse events (io, lexer) done. Basic input extension done. Doctools example working (macros, includes) json blocked out (issues with partial utf, non-utf, non-standard codings - Tcl vs. C pre-handling, strings vs. file) Leaf check-in: 7337a35e08 user: aku tags: trunk
2018-09-18
17:18
Extended multi-byte char counting test to full unicode range. Closed-Leaf check-in: 15feed5264 user: aku tags: trunk-stage
05:01
Fix trailing whitespace check-in: d336725a4a user: aku tags: trunk-stage
05:00
Fixed miscounting of multi-byte characters in rtC. Reworked `inbound_step()`. Changed to test a character for correctness on the first byte, then arrange to (not) step the character location on the trailer bytes following, if any. Always steps the character location on the first byte. Updates the byte/char index the same time. Further reworked and fixed the interface from `gate` to `inbound` for rewinding after a rejection (straight end, or when the matched lexeme is actually shorter than the number of bytes consumed before stopping). The used API (`moveby`) was character based, which caused us to rewind to far, namely to the previous character. No trouble when it was a single-byte character. Bad for a multi-byte however. This could then trip the lexer if the first byte of the previous character is not acceptable at the given point. check-in: 44fee72091 user: aku tags: trunk-stage
2018-09-17
23:43
Added test demonstrating that rtC miscounts multi-byte character locations (because counting at end). Added tools to help with generation of test input check-in: 9337c00616 user: aku tags: trunk-stage
23:41
Tweaked test parser gen to leave a log file behind on grammar parse failure check-in: 1694246f20 user: aku tags: trunk-stage
2018-09-15
04:46
Trim trailing whitespace check-in: e64b244e12 user: aku tags: trunk-stage
2018-09-10
23:05
Added runtime tests. Note: These look to need some of the fixes on trunk-stage. Leaf check-in: 8b8a1cedf6 user: andreask tags: non-utf-bytes
22:23
Started implementation of non-utf byte support. check-in: 2d87d5c92b user: andreask tags: non-utf-bytes
2018-09-09
06:00
Fixed mindt|doctools crashes. Two for asserts. Problem with `mindt` was in the new `gather_events` function. It was not fully adapted to the changed data structures, with the split trigger and event arrays. It saved the loop index as event id, which was correct for the old structures. With the new structures OTOH the event id is a field in the array we iterate over instead. Problem with `doctools` was as above, plus also an unknown mismanagement of the `always` array by the shared rtC generator code for LTM-mode lexemes, i.e LATM=false. It triggered a similar assertion on lexer setup after the fix above was applied. `mindt` avoided that issue because it did not use LTM-mode lexemes for its special forms. Reactivated the disabled tests. Passing now. Note: json still disabled. check-in: cd54178989 user: aku tags: trunk-stage
00:02
Integrated doctools example - tests - remeta Regenerated doctools example parser to match runtime changes. Disabled tests with known problems - json (char mislocation) - mindt (crash! (rtc, mis-capacity for some symset)) - doctools (crash! (rtc, mis-capacity for some symset)) check-in: cba29d80fc user: aku tags: trunk-stage
2018-09-08
22:16
Integrate base event work. Stage before true merge. check-in: ecf881268e user: aku tags: trunk-stage
22:15
Integrate base doctools example. Closed-Leaf check-in: 77d4610d6a user: aku tags: parse-events
22:11
Brought trunk and event work. Closed-Leaf check-in: d912ded2f3 user: aku tags: language-doctools
22:07
Brought trunk work check-in: 6ceda1b56e user: aku tags: parse-events
21:38
Tkt [d9b47bbb78] completed check-in: f97a8147f6 user: aku tags: parse-events
09:04
Bring in the extended test suite which runs the example tests too. check-in: a48220cd05 user: aku tags: trunk
04:23
More integration of the language examples into the main - Run example tests with the main - Regenerate example parsers with the main parsers Updated a few tests. Note: json tests have issues. Parse location differences. Investigate. Tracked the location differences to commit [dd2dfa9d07530d8d] where definition of lexeme start switched from byte-based LEX_START to character-based LEX_START_C. This commit is on branch [parse-events-rtc]. I suspect that the stepping of character locations is incorrect. Looking at the `step()` function I see the character increment done for the last byte of the character. This is ok for the singletons. For multi-byte characters OTOH it looks to be too late. We should step at the first byte of a character. Complicating things, partial characters, i.e. missing trailers, unexpected trailers. json is good to find these issues, its test suite contains many of these cases. Fix will be done on new branch. This commit is not responsible, can be merged. Closed-Leaf check-in: 26a81da805 user: aku tags: example-integration
2018-09-07
23:12
Integrated the language example tests into the main. Added the language example parsers to the `remeta` tool to ensure that they are updated after runtime API changes too, not just the system parsers. Dropped the replicated support files, redirected test suites to the main support as the only support. Note: The json example has issues, major testsuite fail. Bisect where this started and fix. - Looks to be lexeme location information differences between rtC and rtTcl. - Might be fixed in the parse-event branch. Leaf check-in: 67d5a8317b user: andreask tags: mistake, example-test-integration
19:05
Fixed handling of `args` parameter in wrong#args error messages. The text is dependent on core version. -- There is more going on -- Critcl differences, Core differences -- Try later again -- Closed-Leaf check-in: 7dbbfd79f5 user: andreask tags: mistake
04:42
The code generating symbol names for literals was still subject on the exact content of the [:control:] charclass, causing differences in output depending on the version of Tcl used (seen between 8.5.18 and 8.5.19). Now ensured version-independent quoting in all the relevant places. Fixed order of conditions too. Cross-referenced these places. Regenerated the internal parsers (slif, literals). Updated tests. check-in: e71840b49a user: aku tags: trunk
2018-09-06
19:29
Fix missed bogus `static const` in test results. Closed-Leaf check-in: 3b692749d7 user: andreask tags: tkt-d9b47bbb78-event-management
16:32
Ticket [d9b47bbb78]. Fix event management in the generators. C runtime and generator updated for the new data structures. Tests updated. check-in: e2021e3ef5 user: aku tags: tkt-d9b47bbb78-event-management
2018-09-04
19:53
Ticket [d9b47bbb78]. Fix event management in the generators. Tcl runtime and generator updated for the new data structures. Tests updated. C runtime tests broken. Next: C runtime and generator. check-in: 69f8e6628c user: aku tags: tkt-d9b47bbb78-event-management
2018-08-31
23:12
Ticket [d9b47bbb78]. Fix event management in the generators. Extended to fix the event data structures in general. - Modified container, grammar base, grammar classes - Event information split into declaration with state (events), and usage information (triggers). Events are managed in the main container class. Triggers are managed in the grammar base class. Code is common to L0 and G1. API used by the semantics is unchanged. Serialization structure changes. Event section is global. Trigger sections under L0, G1. Used a purpose-written tool to rewrite the old serializations in tests results (gcstate, gcr_*) to the new form. Reworked the SLIF generator to use the new form of serialization. Tests pass, no results changed. Fixed testsuite oops (generatec_gc(c|tcl)) not seen in trunk because everything with gc also had gc_c, etc. GC generators are ok as well. Extended testsuite to cover grammars with events. check-in: 5b0bdf9591 user: andreask tags: tkt-d9b47bbb78-event-management
19:46
Ticket [d9b47bbb78]. Fix event management in the generators. rt-C generator ... scrap ... Working on this makes me realize that my data structures are still not right. Re-reading the documentation at https://metacpan.org/pod/distribution/Marpa-R2/pod/Event.pod I came back with the impression that the user-facing event names are indeed in a single scope. It is not spelled out explicitly, but I can't see it work if not. Especially with runtime (de)activation of events it must be. Which means that having separate structures for G1 and L0 is wrong. The core event information (name -> activation) status must be shared. The trigger tables can be separate, should be even. Saving this for posterity, and restarting (from the container outward) Closed-Leaf check-in: 5bf97382b1 user: andreask
19:36
Ticket [d9b47bbb78]. Fix event management in the generators. rt-Tcl generator updated. Tests updated. Split the more complex internal formatting commands into more composable pieces, and re-composed them. check-in: 83c23635b6 user: andreask
07:39
Ticket [d9b47bbb78]. Fix event management in the generators. Tests fully updated. Started update of generators. - slif ok - tparse in the works check-in: d4619268cc user: aku
2018-08-30
23:32
Ticket [d9b47bbb78]. Fix event management in the generators. Changed container data structures. Separated event definition from event usage by symbols (triggers). Attention: This changes the grammar serialization. Started update of tests. TODO: Update generators. check-in: 8c77fd896a user: andreask
21:21
Ticket [d9b47bbb78]. Fix event management in the generators. Prep work, a test. As part of the expected results this also contains an implied draft spec for various changed data structures (engine, slif container), and semantic processing. check-in: 4cd32b9b44 user: andreask tags: mistake
06:03
Merge latest to doctools work check-in: 783110305c user: aku tags: language-doctools
06:02
Merge latest to event work check-in: 783f0feb8b user: aku tags: parse-events
06:01
Merge work back to main. check-in: cf5be67f4a user: aku tags: trunk
06:00
Completed the pass. Closed-Leaf check-in: ef8ad60e6d user: aku tags: test-cleanup
00:07
Start some cleanup of testsuite, some simplifications for var handling of the loop helpers ... check-in: 4f1c1e6f02 user: andreask tags: test-cleanup
2018-08-29
06:59
Updated main line with continued event work. check-in: 36533140c7 user: aku tags: trunk
06:56
Updated example check-in: c5b2707fc3 user: aku tags: language-doctools
06:54
Completed work on the ticket, integrated with base. check-in: 16d33a63ac user: aku tags: parse-events
06:53
Added test for parser reset. Closed-Leaf check-in: 31becc141c user: aku tags: tkt-a5abdaacd4-shift-test-reset
2018-08-28
23:57
Document i-gen a bit more. Additional helpers for one-shot grammar tests outside of the big loops. check-in: d8bfe7d142 user: andreask tags: tkt-a5abdaacd4-shift-test-reset
22:37
Ticket [a5abdaacd4]. Shift engine reset to start of parsing. Keep state for introspection (statistics, progress reports, etc.) post parse. Needs testcases, before/after introduction of reset. check-in: 230aebacc3 user: andreask tags: tkt-a5abdaacd4-shift-test-reset
05:33
Get trunk and doc work into current example. check-in: 6563481eae user: aku tags: language-doctools
04:40
Final grammar fixes based on processing the Tcllib manpages. 100% pass (429 pages. Note, three down in the total, had include files caught in the simple find) - Allow backslash in general text (and examples). - Allow continuation lines in markup commands. - Allow whitespace after the last element of `tclwords` sequences. - Fixed oops in the include file handling, failed to process the last character of the included file. This was ok when it was a closing newline. Bad for files without such, i.e. ending in the closing bracket of a command. - Allow section command with two arguments. Same as subsection. - Allow 'args' as alias for 'arguments' (list type). - Tweak character escapes, recognize double-backslash. - Allow empty list elements (No text between element markers) - Paragraph sequences are improper (allow tailing separator) Fixed ca 20 tcllib manpages which had issues the older doctools system is not catching (para before first list item, bad closing brackets, silent swallowing of bogus markup commands, ...). check-in: 4b364c62cf user: aku tags: language-doctools
2018-08-24
06:26
More grammar fixes based on processing the Tcllib manpages. Implemented a lexeme barrier (lexer flush) for use in stop event handlers. Forces the lexeme sequence for the included file or similar to coincide with the end of that secondary input, and everything going up to the border flushed, so that it will be crossed without any rewind after the last lexeme. Prevents rewinding in the outer input stream after returning from the include, causing very bogus lexing. Note, this only a lexer flush, without a parser eof. Updated doctools-base to use the new barrier in its include management. Grammar changes/fixes: - Dropped the special forms fully from the G1. - Configured the special forms for LTM, i.e. to always match. - Allow use of 'enum' as a list type, vs. just the official 'enumerated'. - Allow opening brace in example text, and quoted strings. - Fixed `subsection` syntax, may have two arguments. This fixed an issue with seeing include or vset in places not expected by the G1 from before the change. Now they can appear everywhere between lexemes. Fixed rt-Tcl's handling of LTM (= always-on) lexemes. Bad id conversion in a few places. Statistics: - 355 pass 82% - 77 fail 18% - 432 total 100% check-in: 21eb6e74f0 user: aku tags: language-doctools
2018-08-23
07:07
More grammar fixes based on processing the Tcllib manpages. Fixed `braced` vs `simple`. Fixed `nbsimplex` vs `simple`. (G1 ambiguities due to L0 lexeme language overlap) Fixed use of `subsections` in pseudo-section `description`. Allowed. Eight combinations for paragraphs, subsections, and sections. Added missing `manpage` markup command. Fixed handling of empty include files. Shortcircuit entry and return. Statistics: - 299 pass 69% - 133 fail 31% - 432 total 100% check-in: 417bd724ce user: aku tags: language-doctools
00:00
Trialing real-work manpages from Tcllib. Fixed issue in grammar around '\[' escapes. Removed `\` from ordinary set, treat separate, with \[, \]. Extended the `dt` helper to operate on multiple files, reference files. Added options `--help`, `-o`, and `-t` to `dt`. Extended `doctools::base` with a limited debug stream to show include processing (files, file size, returns). Tweaks in rt-Tcl parser's narrative tracing. Ditto in rt-Tcl lexer. Fixed ambiguity `simple` vs `simplex`. Because the latter was not strictly longer than the former all g_text's with a single simple were also simplex, for 2^n blowup. Making `simplex` strictly longer (i.e. at least 2 simple in the series) this is gone. TODO: `simple` vs `braced` ambiguity where `simple` accepts `{...}` while it should not. That is a simple (sic!) grammar bug. check-in: a9c57a9594 user: andreask tags: language-doctools
2018-08-22
05:08
Added doctools helper application (`dt`) to show AST for arbitrary doctools input. Added demo tool. Refers to bogus example at the moment. check-in: 39ce9e09a5 user: aku tags: language-doctools
05:07
Fix use of `mymethod`, use direct `my`. check-in: a4eb59f54f user: aku tags: language-doctools
05:01
Fix path resolution for includes. Fully normalize all base paths. check-in: b2f700d6b7 user: aku tags: language-doctools
05:00
Fix missing requirements check-in: e708c6d8da user: aku tags: language-doctools
04:59
Grammar fixes - Allow `vset` (reference) at word level, not just as markup. (Should trial if `latm 0` would work for the special forms) - Kill multiple levels of quantification in various L0 symbols. This is just ambiguity where a series of N elements generates `partition(N)` (*) trees for the same lexeme. TODO: Write semantic checks for a SLIF container to detect and warn about this. (*) https://en.wikipedia.org/wiki/Partition_(number_theory) https://oeis.org/A000041 Very much beyond exponential. check-in: 32457c306a user: aku tags: language-doctools
04:51
C code generator tweaks and fixes: - Fix the issue with the bad indentation of per-symbol semantic information. New array configuration flag: `align`. Default true. - Change lexeme map for events to proper tabular. New array configuration flag: `padright`. Default false (i.e. pad left). check-in: b719ce15a0 user: aku tags: language-doctools
2018-08-21
23:51
Brought part of the `mindt` example, and associated experiences into the doctools example. Reworked the doctools grammar to properly handle the special forms. Further reworked to give it a structure similar to the `mindt` grammar. Started updating the tests. Oops! There is strong ambiguity in the innards of `tclword`, the G1 level sub-structuring of command arguments. It happens because at G1 level we have invisible whitespace discards, and so a text like `BBB CCC` can be parsed as two separate `unquoted` arguments (good), or as a __single__ `unquoted` argument of two elements (bad) because the spacing separating them went away at L0 level. Something similar happens in the definition for `quoted`. `braced` is fine because it maps to a single unbroken lexeme. This needs more thought ... Currently I only see bringing the argument spacing into the G1 as possible solution. The lexeme match would have precedence over the discards. Oh, and `quoted` already has `space` as element of the structure, so maybe it would not have the issue. That would be similar to the special forms, they handle their inner whitespace explicitly too, although in L0. ... A quick trial looks to be working ... Only forest remaining looks to be `breaker` vs `space` in example blocks. Fixed the ambiguity as well. Introduced non-breaking (nb) space, allow only that in the example text. Use non-breaking simplex variants as well to gather longer chunks of input. This looks good. Tests all ok, and ASTs look good enough. I may introduce the special action `::evens` however to drop unwanted spaces in the `tclwords` sub-tree. Explicit removal of the separators. Tests/examples for variable handling and includes have priority tough. Also, get larger/real-world examples (Tcllib, Critcl, Kettle, Kinetcl, etc). check-in: ea5192243e user: andreask tags: language-doctools
17:50
Brought information from mini-dt and the new features for IO and parse events into the main doctools example. check-in: 8a648972d9 user: andreask tags: language-doctools
17:37
Integrated multi-stop support as new feature into parse-events. Mini doctools examples demonstrates usefulness (and worked out how to handle vset/include for doctools ;). check-in: 27f7f4624d user: andreask tags: parse-events
17:32
Integrated mini-dt into the branch it came from, multi-stop handling. Closed-Leaf check-in: 4a80913963 user: andreask tags: multi-stop
17:29
Tweaked comments in the mindt engines a bit. Reworked the class structure to better hide the internal composition with event handling and multiple parsers from the tests (and users). Updated test setup code. Added notes documenting the class structure. Tests themselves pass without change. The IO changes for from/to look to be a success. Removed the scratch IO logs from then the problems with overrun were detected. Closed-Leaf check-in: 8183573f2f user: andreask tags: mini-dt
07:54
Fixed the higher layers, adapted to the changes in inbound. Updated tests. Still have to document the higher match location methods. check-in: 5f55575b22 user: aku tags: mini-dt
2018-08-20
23:16
Reworked API of low-level inbound, documented start/stop location, and limit meanings. Updated tests. TODO: Update higher layers. check-in: 872ecc83c7 user: andreask tags: mini-dt
2018-08-17
23:32
Tentative change to inbound for `stop after a character`. With controlling condition at a place in the loop where it can `stop before the first character` also, giving the full range. Condering to tweak the user-visible api to still mean `stop before character`, as a shifted form of the above, so that all user-specified locations are >= 0, never negative. This work will move to its own branch off `parse-events`, to work it out completely, including internal docs before continuing on the (min)doctools. check-in: e66e52ad3c user: andreask tags: mini-dt
22:09
Scratch - Added narrative tracing to inbound around the IO loop, track exactly where it is and how it moves. - X1, X2 - Demo of the `-1` difference for the include-iii test processing. It seems the issue is rooted in the fact that I stop the engine before the marked character is processed. Trying to set the stop marker after the last character of the stream, to stop after everything has been processed ... does not work. The possible fix would be to change the semantics and stop after the marked character has processed. Then setting the marker on the last character will stop after it, before trying to move into the bad lands. Techically this shifts the impossible stop location to before the first character of the input. OTOH, if we choose to allow an absolute position -1, that might work (shift the controlling test to the beginning of the io loop). The inband `no stop`-signal would then have to be `-2`. check-in: 06e8e23383 user: andreask tags: mini-dt
2018-08-16
22:37
Oh! :facepalm: `inbound_enter` copies just the pointer to the input string into the engine state. This pointer is (as critcl type `pstring`) just a pointer into the Tcl_Obj internals (GetStringFromObj). We are fine as long as we do not perform `inbound_enter_more`, because nothing does anything with the pointer, but reading from it. Not even basic events. But with `inbound_enter_more` we realloc that area. Means that the old space will be freed when the system did not find enough space to expand in place. At which point we pulled the rug out of under the Tcl_Obj* the string came from. Doing anything with that Tcl_Obj* (like releasing it, example `unset var`) will then smash that memory. Fixed, `inbound` now tracks if it owns IN.bytes and acts accordingly when expanding the area in `enter_more`. TODO: Validate stop locations, must not allow anything beyond the end of the allocated memory, system will try to run over into uninitialized cells at the moment. TODO: Alt: Force engine stop on max for total as measure against the above. check-in: 57ccd4001d user: andreask tags: mini-dt
19:58
Completed processing of includes. Completed tests. rtC crash with includes. check-in: e21f8235e9 user: andreask tags: mini-dt
08:07
Completed rtC state reset (all stages!) Fixes the aforementioned weird issue in the rtC/special forms, skips first character. Bogus char index information. Leftover test failures are the incomplete handling of includes. check-in: b889957b9c user: aku tags: mini-dt
07:37
Integrated parser for special forms into the main parser. vset processing done. include processing still to be done. Updated tests. Weird issue in the rtC/special forms, skips first character ? check-in: 79362316fe user: aku tags: mini-dt
07:34
Fix level error in intercept helper check-in: 85d64bf32c user: aku tags: mini-dt
07:33
Extended rtC to properly report errors from event callbacks. Extended both runtimes to reset the system after a parse ends. This allows multiple calls to `process(-file)` for a single parser. Each call is a new run. check-in: 0cf5429d76 user: aku tags: mini-dt
2018-08-15
23:53
Created grammar for the special forms. Testsuite for same. check-in: de9583426c user: andreask tags: mini-dt