Ticket UUID: | 1735601 | |||
Title: | grammar::fa::op::toRegexp | |||
Type: | RFE | Version: | None | |
Submitter: | lars_h | Created on: | 2007-06-12 09:34:58 | |
Subsystem: | grammar_fa | Assigned To: | andreas_kupries | |
Priority: | 5 Medium | Severity: | ||
Status: | Closed | Last Modified: | 2007-08-25 00:17:43 | |
Resolution: | Closed By: | andreas_kupries | ||
Closed on: | 2007-08-22 22:03:58 | |||
Description: |
Just wanted to log this natural companion of fromRegexp. I don't need it myself right now, but I might do in the future, and about a month ago two colleagues mentioned it would be nice if their program could also convert the automata it generated to regexps... I'll probably fill this RFE myself, eventually, unless someone beats me to it. :-) | |||
User Comments: |
andreas_kupries added on 2007-08-25 00:17:43:
Logged In: YES user_id=75003 Originator: NO Note: I already added some minimal testing for toRegexp, toRegexp2, simplifyRegexp, and toTclRegexp, based on a FA I found in some online material about FA theory. Of course, more testing is welcome, very much so. lars_h added on 2007-08-24 18:52:26: Logged In: YES user_id=938835 Originator: YES I'm currently away at a conference and have very shaky internet connection, but I'll see what I can do. A catch concerning testing toRegexp (did it get spelt that way?) is that there can be many REs representing the same language, so it's a bit difficult to compare against. Maybe the best way is to have a set of test words and verify that Tcl's RE engine recognises the same subset of them as the grammar::fa engine does. lars_h added on 2007-08-24 18:52:09: Logged In: YES user_id=938835 Originator: YES I'm currently away at a conference and have very shaky internet connection, but I'll see what I can do. A catch concerning testing toRegexp (did it get spelt that way?) is that there can be many REs representing the same language, so it's a bit difficult to compare against. Maybe the best way is to have a set of test words and verify that Tcl's RE engine recognises the same subset of them as the grammar::fa engine does. andreas_kupries added on 2007-08-23 05:03:58: Logged In: YES user_id=75003 Originator: NO Accepted, and integrated. small change: toRegexp(2) takes an FA instead of a serialization of one. The serialization is done inside of the proc. Also wrote a wrapper command to the tclre namespace for infic translation => toTclRegexp. I would now really like to have examples and test-cases for this. andreas_kupries added on 2007-08-23 00:06:15: Logged In: YES user_id=75003 Originator: NO Note: The nullary .| has been submitted and can be used for this now. andreas_kupries added on 2007-08-15 04:48:17: Logged In: YES user_id=75003 Originator: NO We actually had a method which nearly did what you wanted, symbols@. Was not really documented however. This method has been extended for what you need: FA symbols@ s1 s2 returns the set of symbols which transition from state s1 to state s2. The set is empty if there are no transitions between the two states. The empty symbol in the set indicates epsilon-transitions. That should make the generation of the matrix simpler. andreas_kupries added on 2007-08-14 03:41:32: Logged In: YES user_id=75003 Originator: NO Thanks. I will have a look over the next days. lars_h added on 2007-08-11 22:47:45: File Added - 240844: toregexp.pdf Logged In: YES user_id=938835 Originator: YES In case you're not completely discouraged by the power series interpretation of formal languages that underlies the algorithm I used, the book Werner Kuich and Arto Salomaa: Semirings, automata, languages, Springer-Vlg, 1986 (ISBNs 0-387-13716-5 and 3-540-13716-5) develops this theory in more detail. File Added: toregexp.pdf lars_h added on 2007-08-11 22:38:07: File Added - 240843: toregexp.dtx Logged In: YES user_id=938835 Originator: YES OK, I now have a working implementation, but it's nowhere near being integrated with the grammar::fa package. I probably won't do further work on this for a while, so I just wanted to make the results available. To load the code as provided, do: package require docstrip docstrip::sourcefrom toregexp.dtx pkg Top-level commands made available: grammar::fa::op::toRegexp $serialization Returns RE for the language accepted by the automaton whose serialisation is $serialization. grammar::fa::op::toRegexp2 $serialization Same, but tries to expand choices in the regexp. grammar::fa::op::simplifyRegexp $RE Seeks to simplify a regexp by converting it and all its subexpressions to an automaton (which is minimized) and back, picking the shorter of the two. (Seems to find some simplifications, but not others.) namespace inscope grammar::fa::op::tclre $RE $dict Converts a grammar::fa-style regular expression $RE to a Tcl-style regular expression (or rather a pair of a "surface type" and the actual regular expression). The $dict maps symbol names to such pairs. If a symbol is not in the dict then single-character symbols are interpreted as those characters and other symbols are interpreted as character class names. I will also attach a typeset form of the source. It might help explain the algorithm used. File Added: toregexp.dtx |