TDParser: a simple parsing library in Python

Getting started

Installation

First, you’ll need to get the latest version of TDParser.

TDParser is compatible with all Python versions from 2.6 to 3.2.

The simplest way is to get it from PyPI:

$ pip install tdparser

You may also fetch the latest development version from https://github.com/rbarrois/tdparser:

$ git clone git://github.com/rbarrois/tdparser.git
$ cd parser
$ python setup.py install

Defining the tokens

TDParser provides a simple framework for building parsers; thus, it doesn’t provide default token kinds.

Defining a token type requires 4 elements:

  • Input for the token: TDParser uses a regexp, in the tdparser.Token.regexp attribute
  • Precedence, an integer stored in the tdparser.Token.lbp attribute
  • Value that the token should get when it appears at the beginning of a standalone expression; this behavior is defined in the tdparser.Token.nud() method
  • Behavior of the token when it appears between two expressions; this is defined in the tdparser.Token.led() method.

An example definition of a simple arithmetic parser that returns the expression’s value would be:

from tdparser import Token

class Integer(Token):
    regexp = r'\d+'
    def nud(self, context):
        return int(self.text)

class Addition(Token):
    regexp = r'\+'
    lbp = 10

    def led(self, left, context):
        return left + context.expression(self.lbp)

class Multiplication(Token):
    regexp = r'\*'
    lbp = 20

    def led(self, left, context):
        return left * context.expression(self.lbp)

Building the Lexer/Parser

The parser has a simple interface: it takes as input an iterable of tdparser.Token, and returns the expression that the tokens’ nud() and led() methods return.

The lexer simply needs to get a list of valid tokens:

lexer = tdparser.Lexer(with_parens=True)
lexer.register_tokens(Integer, Addition, Multiplication)

The with_parens=True options adds a pair of builtin tokens, tdparser.LeftParen and tdparser.RightParen, which provide left/right parenthesis behavior.

Note

The default lexer will skip space and tabulations. This can be modified by settings the blank_chars argument when initializing the lexer.

We now only need to feed our text to the lexer:

>>> lexer.parse('1 + 1')
2
>>> lexer.parse('2 * 3 + 4')
10

Indices and tables