Tcl Library Source Code

Documentation
Login


[ Main Table Of Contents | Table Of Contents | Keyword Index | Categories | Modules | Applications ]

NAME

pt_parser_api - Parser API

Table Of Contents

SYNOPSIS

package require Tcl 8.5

className ?objectName?
objectName destroy
objectName parse chan
objectName parset text

DESCRIPTION

Are you lost ? Do you have trouble understanding this document ? In that case please read the overview provided by the Introduction to Parser Tools. This document is the entrypoint to the whole system the current package is a part of.

This document describes the API shared by the grammar interpreter provided by the package pt::peg::interp and the parsers generated by the pt application for the result formats critcl, snit, and oo regarding access to the actual parsing functionality.

Its intended audience are people who wish to create a parser for some language of theirs and then use that parser within a Tcl-based package or application.

It resides in the User Layer of Parser Tools.

Class API

Instance API

All parser instances provide at least the methods shown below:

Usage

A generated parser is used like this

package require the-parser-package ;# Generated by result-formats 'critcl', 'snit' or 'oo' of 'pt'.
set parser [the-parser-class]

set ast [$parser parse $channel]
... process the abstract syntax tree ...

When using a grammar interpreter for parsing some differences creep in

package require the-grammar-package ;# Generated by result-format 'container' of 'pt'.
set grammar [the-grammar-class]

package require pt::peg::interp
set parser [pt::peg::interp]

$parser use $grammar

set ast [$parser parse $channel]
$parser destroy

... process the abstract syntax tree ...

AST serialization format

Here we specify the format used by the Parser Tools to serialize Abstract Syntax Trees (ASTs) as immutable values for transport, comparison, etc.

Each node in an AST represents a nonterminal symbol of a grammar, and the range of tokens/characters in the input covered by it. ASTs do not contain terminal symbols, i.e. tokens/characters. These can be recovered from the input given a symbol's location.

We distinguish between regular and canonical serializations. While a tree may have more than one regular serialization only exactly one of them will be canonical.

Example

Assuming the parsing expression grammar below

PEG calculator (Expression)
    Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'       ;
    Sign       <- '-' / '+'                                     ;
    Number     <- Sign? Digit+                                  ;
    Expression <- Term (AddOp Term)*                            ;
    MulOp      <- '*' / '/'                                     ;
    Term       <- Factor (MulOp Factor)*                        ;
    AddOp      <- '+'/'-'                                       ;
    Factor     <- '(' Expression ')' / Number                   ;
END;

and the input string

120+5

then a parser should deliver the abstract syntax tree below (except for whitespace)

set ast {Expression 0 4
    {Factor 0 4
        {Term 0 2
            {Number 0 2
                {Digit 0 0}
                {Digit 1 1}
                {Digit 2 2}
            }
        }
        {AddOp 3 3}
        {Term 4 4
            {Number 4 4
                {Digit 4 4}
            }
        }
    }
}

Or, more graphical

PE serialization format

Here we specify the format used by the Parser Tools to serialize Parsing Expressions as immutable values for transport, comparison, etc.

We distinguish between regular and canonical serializations. While a parsing expression may have more than one regular serialization only exactly one of them will be canonical.

Example

Assuming the parsing expression shown on the right-hand side of the rule

Expression <- Term (AddOp Term)*

then its canonical serialization (except for whitespace) is

{x {n Term} {* {x {n AddOp} {n Term}}}}

Bugs, Ideas, Feedback

This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category pt of the Tcllib Trackers. Please also report any ideas for enhancements you may have for either package and/or documentation.

When proposing code changes, please provide unified diffs, i.e the output of diff -u.

Note further that attachments are strongly preferred over inlined patches. Attachments can be made by going to the Edit form of the ticket immediately after its creation, and then using the left-most button in the secondary navigation bar.

KEYWORDS

EBNF, LL(k), PEG, TDPL, context-free languages, expression, grammar, matching, parser, parsing expression, parsing expression grammar, push down automaton, recursive descent, state, top-down parsing languages, transducer

CATEGORY

Parsing and Grammars

COPYRIGHT

Copyright © 2009 Andreas Kupries