Packrat Parsing: Simple, Powerful, Lazy, Linear Time

Bryan Ford
Massachusetts Institute of Technology

Presented at

International Conference on Functional Programming, October 4-6, 2002, Pittsburgh

Abstract

Packrat parsing is a novel technique for implementing parsers in a lazy functional programming language. A packrat parser provides the power and flexibility of top-down parsing with backtracking and unlimited lookahead, but nevertheless guarantees linear parse time. Any language defined by an LL(k) or LR(k) grammar can be recognized by a packrat parser, in addition to many languages that conventional linear-time algorithms do not support. This additional power simplifies the handling of common syntactic idioms such as the widespread but troublesome longest-match rule, enables the use of sophisticated disambiguation strategies such as syntactic and semantic predicates, provides better grammar composition properties, and allows lexical analysis to be integrated seamlessly into parsing. Yet despite its power, packrat parsing shares the same simplicity and elegance as recursive descent parsing; in fact converting a backtracking recursive descent parser into a linear-time packrat parser often involves only a fairly straightforward structural change. This paper describes packrat parsing informally with emphasis on its use in practical applications, and explores its advantages and disadvantages with respect to the more conventional alternatives.

Full Paper

In PDF or PostScript

Slides from ICFP Presentation

In HTML

Example Parsers

Complete and working versions of the example parsers described in the paper are available on this page.

ArithRecurse.hs: Recursive descent parser described in Section 2.1, for the trivial arithmetic expression language of Figure 1.

ArithPackrat.hs: Equivalent packrat parser for the same trivial language, Section 2.4.

ArithLeft.hs: Left recursion example for Section 3.1, which extends the above packrat parser with properly left-associative subraction, division, and modulo operators.

ArithLex.hs: Integrated lexical analysis example for Section 3.2, which extends the previous packrat parser with support for multiple-digit decimal literals and optional whitespace padding between literals, operators, and punctuation.

ArithMonad.hs: Example packrat parser, equivalent to ArithPackrat.hs, but using monadic combinators to express the parsing functions more succinctly and provide support for user-friendly error detection and reporting. Discussed in Section 3.3 of the paper. The following two library modules are required:

Enjoy!


Bryan Ford's Home Page