Tk Library Source Code

View Ticket
Login
Ticket UUID: 1735601
Title: grammar::fa::op::toRegexp
Type: RFE Version: None
Submitter: lars_h Created on: 2007-06-12 09:34:58
Subsystem: grammar_fa Assigned To: andreas_kupries
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2007-08-25 00:17:43
Resolution: Closed By: andreas_kupries
    Closed on: 2007-08-22 22:03:58
Description:
Just wanted to log this natural companion of fromRegexp.

I don't need it myself right now, but I might do in the 
future, and about a month ago two colleagues mentioned 
it would be nice if their program could also convert the 
automata it generated to regexps...

I'll probably fill this RFE myself, eventually, unless 
someone beats me to it. :-)
User Comments: andreas_kupries added on 2007-08-25 00:17:43:
Logged In: YES 
user_id=75003
Originator: NO

Note: I already added some minimal testing for toRegexp, toRegexp2, simplifyRegexp, and toTclRegexp, based on a FA I found in some online material about FA theory.
Of course, more testing is welcome, very much so.

lars_h added on 2007-08-24 18:52:26:
Logged In: YES 
user_id=938835
Originator: YES

I'm currently away at a conference and have very shaky internet connection, but I'll see what I can do.

A catch concerning testing toRegexp (did it get spelt that way?) is that there can be many REs representing the same language, so it's a bit difficult to compare against. Maybe the best way is to have a set of test words and verify that Tcl's RE engine recognises the same subset of them as the grammar::fa engine does.

lars_h added on 2007-08-24 18:52:09:
Logged In: YES 
user_id=938835
Originator: YES

I'm currently away at a conference and have very shaky internet connection, but I'll see what I can do.

A catch concerning testing toRegexp (did it get spelt that way?) is that there can be many REs representing the same language, so it's a bit difficult to compare against. Maybe the best way is to have a set of test words and verify that Tcl's RE engine recognises the same subset of them as the grammar::fa engine does.

andreas_kupries added on 2007-08-23 05:03:58:
Logged In: YES 
user_id=75003
Originator: NO

Accepted, and integrated. small change: toRegexp(2) takes an FA instead of a serialization of one. The serialization is done inside of the proc.
Also wrote a wrapper command to the tclre namespace for infic translation => toTclRegexp.
I would now really like to have examples and test-cases for this.

andreas_kupries added on 2007-08-23 00:06:15:
Logged In: YES 
user_id=75003
Originator: NO

Note: The nullary .| has been submitted and can be used for this now.

andreas_kupries added on 2007-08-15 04:48:17:
Logged In: YES 
user_id=75003
Originator: NO

We actually had a method which nearly did what you wanted, symbols@. Was not really documented however.
This method has been extended for what you need:

    FA symbols@ s1 s2

returns the set of symbols which transition from state s1 to state s2. The set is empty if there are no transitions between the two states. The empty symbol in the set indicates epsilon-transitions.

That should make the generation of the matrix simpler.

andreas_kupries added on 2007-08-14 03:41:32:
Logged In: YES 
user_id=75003
Originator: NO

Thanks. I will have a look over the next days.

lars_h added on 2007-08-11 22:47:45:

File Added - 240844: toregexp.pdf

Logged In: YES 
user_id=938835
Originator: YES

In case you're not completely discouraged by the power series interpretation of formal languages that underlies the algorithm I used, the book

Werner Kuich and Arto Salomaa:
Semirings, automata, languages,
Springer-Vlg, 1986 (ISBNs 0-387-13716-5 and 3-540-13716-5)

develops this theory in more detail.
File Added: toregexp.pdf

lars_h added on 2007-08-11 22:38:07:

File Added - 240843: toregexp.dtx

Logged In: YES 
user_id=938835
Originator: YES

OK, I now have a working implementation, but it's nowhere near being integrated with the grammar::fa package. I probably won't do further work on this for a while, so I just wanted to make the results available.

To load the code as provided, do:
 package require docstrip
 docstrip::sourcefrom toregexp.dtx pkg

Top-level commands made available:

   grammar::fa::op::toRegexp $serialization
Returns RE for the language accepted by the automaton whose serialisation is $serialization.

   grammar::fa::op::toRegexp2 $serialization
Same, but tries to expand choices in the regexp.

   grammar::fa::op::simplifyRegexp $RE
Seeks to simplify a regexp by converting it and all its subexpressions to an automaton (which is minimized) and back, picking the shorter of the two. (Seems to find some simplifications, but not others.)

   namespace inscope grammar::fa::op::tclre $RE $dict
Converts a grammar::fa-style regular expression $RE to a Tcl-style regular expression (or rather a pair of a "surface type" and the actual regular expression). The $dict maps symbol names to such pairs. If a symbol is not in the dict then single-character symbols are interpreted as those characters and other symbols are interpreted as character class names.

I will also attach a typeset form of the source. It might help explain the algorithm used.
File Added: toregexp.dtx

Attachments: