Tcl Library Source Code

Documentation
Login


[ Main Table Of Contents | Table Of Contents | Keyword Index | Categories | Modules | Applications ]

NAME

grammar::me_vm - Virtual machine for parsing token streams

Table Of Contents

DESCRIPTION

Please go and read the document grammar::me_intro first for an overview of the various documents and their relations.

This document specifies a virtual machine for the controlled matching and parsing of token streams, creating an abstract syntax tree (short AST) reflecting the structure of the input. Special machine features are the caching and reuse of partial results, caching of the encountered input, and the ability to backtrack in both input and AST creation.

These features make the specified virtual machine especially useful to packrat parsers based on parsing expression grammars. It is however not restricted to this type of parser. Normal LL and LR parsers can be implemented with it as well.

The following sections will discuss first the abstract state kept by ME virtual machines, and then their instruction set.

MACHINE STATE

A ME virtual machine manages the following state:

MACHINE INSTRUCTIONS

With the machine state specified it is now possible to explain the instruction set of ME virtual machines. They are grouped roughly by the machine state they influence and/or query.

TERMINAL MATCHING

First the instructions to match tokens from the input stream, and by extension all terminal symbols.

These instructions are the only ones which may retrieve a new token from the input stream. This is a may and not a will because the instructions will a retrieve new token if, and only if the current location CL is at the head of the stream. If the machine has backtracked (see icl_rewind) the instructions will retrieve the token to compare against from the internal cache.

NONTERMINAL MATCHING

The instructions in this section handle the matching of nonterminal symbols. They query the nonterminal cache NC for saved information, and put such information into the cache.

The usage of the cache is a performance aid for backtracking parsers, allowing them to avoid an expensive rematch of complex nonterminal symbols if they have been encountered before.

UNCONDITIONAL MATCHING

The instructions in this section are the remaining match operators. They change the match status OK directly and unconditionally.

CONTROL FLOW

The instructions in this section implement both conditional and unconditional control flow. The conditional jumps query the match status OK.

INPUT LOCATION HANDLING

The instructions in this section are for backtracking, they manipulate the current location CL of the machine state. They allow a user of the machine to query and save locations in the input, and to rewind the current location CL to saved locations, making them one of the components enabling the implementation of backtracking parsers.

ERROR HANDLING

The instructions in this section provide read and write access to the error status ER of the machine.

SEMANTIC VALUES

The instructions in this section manipulate the semantic value SV.

AST STACK HANDLING

The instructions in this section manipulate the AST stack AS, and the AST Marker stack MS.

Bugs, Ideas, Feedback

This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category grammar_me of the Tcllib Trackers. Please also report any ideas for enhancements you may have for either package and/or documentation.

When proposing code changes, please provide unified diffs, i.e the output of diff -u.

Note further that attachments are strongly preferred over inlined patches. Attachments can be made by going to the Edit form of the ticket immediately after its creation, and then using the left-most button in the secondary navigation bar.

KEYWORDS

grammar, parsing, virtual machine

CATEGORY

Grammars and finite automata

COPYRIGHT

Copyright © 2005 Andreas Kupries