Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introductions

Iota is a PEG parser combinator library for Rust. You can build parsers by composing small, reusable expressions that map directly to grammar rules. There is no code generation, no macros, and no seperate grammar files.

Features

  • Composable expressions with + and | operators
  • Memoised parsing for guaranteed linear time on most grammars.
  • Generioc over any AST type via the ParseValue trait
  • Helpers for common patterns
  • Recursive grammars with lazy()

Getting Started

Installation

[dependencies]
iota-parser = "0.1.0"

ParseValue Trait

This trait is extremely important for ensuring your types work with Iota. A parser is generic over a type T which implements ParseValue.

#![allow(unused)]
fn main() {
pub trait ParseValue: Clone {
    fn from_str(s: String) -> Self;
    fn as_str(&self) -> String;
}
}

For simple string parsing, String already implements ParseValue natively.

For your own AST types, you need to implement it yourself. from_str is called for every raw character or literal match, so it’s default needs to be sensible, and the real construction of your type happens inside action callbacks.

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
struct JsonObject {
    key: String,
    value: String,
    raw: String,
}

impl ParseValue for JsonObject {
    fn from_str(s: String) -> Self {
        JsonObject { key: String::new(), value: String::new(), raw: s }
    }
    
    fn as_str(&self) -> String {
        self.raw.clone()
    }
}
}

Example Parser

use iota::{IotaParser, action, one_or_more, digit};

fn number() -> Expression {
    action(
        one_or_more(digit),
        |vals| vals.join(""),
    )
}

fn main() {
    let mut parser = IotaParser::::new("123".to_string());
    match parser.parse(&number(), 0) {
        Some((_, vals)) => println!("{:?}", vals[0]),
        None => println!("Parse failed"),
    }
}

Running Examples

Iota comes with some examples you can run directly so you can get a feel for it.

cargo run --example sexpr
cargo run --example json
  • sexpr is a simple Lisp s-expression parser e.g. (+ 1 2)
  • json is a simple key-value parser e.g. {"name": "blinx"}

Core Concepts

Expressions

Everything in Iota is an Expression<T>. Expressions are composable, you build complex parsers by combining simple ones.

An Expression is an enum under the hood, but you’ll never have to construct them yourself, which is why helpers exist in Iota.

Combining Expressions

Iota overloads + and | so your parser code reads like a grammar rule.

+ creates a sequence, both expressions must match in order:

#![allow(unused)]
fn main() {
let key_value = quoted_string() + lit(":") + quoted_string();
}

This matches:

"key":"value"

| creates an ordered choice, the left will be tried first, if that fails the right is tried.

#![allow(unused)]
fn main() {
let flag = lit("true") | lit("false");
}

This matches:

"true" or "false"

Actions

An action transforms the matched values of an expression into a single value, this is where your construct your AST nodes.

#![allow(unused)]
fn main() {
let number = action(
    one_or_more(digit),
    |vals| vals.join("")
);
}

This matches:

"123" will return "123" as a String

vals is a flattened vector of any matched sub-expressions. values() can be used to filter out any sentinels from ignore().

Ignoring Values

Some parts of your grammar may be structural, such as brackets, colons, commas etc. and you wouldn’t want them showing up in the vals vector. You can use ignore() to discard them.

#![allow(unused)]
fn main() {
let parens = ignore(ch('(')) + expr() + ignore(ch(')'));
}

Always use values() in your action when using ignore()

#![allow(unused)]
fn main() {
    ...
    |vals| {
        let v = values(vals);
    }
    ...
}

Recursion

PEG parsers like Iota are recursive descent, a rule can refer to itself. But in Rust, constructing a recursive expression tree eagerly causes infinite recursion, so we have to use lazy() to defer the construction until parse time.

#![allow(unused)]
fn main() {
fn expr() -> Expression {
    let parenthesised = action(
        ch('(') + lazy(expr) + ch(')'),
        |vals| vals[1].clone(),
    );
    
    parenthesised | number()
}
}

lazy(expr) takes a function pointer, since expr is a function with no arguments it can be passed directly without a closure.

Memoisation

IotaParser maintains a memo table keyed on (express pointer, position). If the same expression is tried at the same position twice, the cached result is returned instead of a recomputation. This guarentees that even complex recursive grammars run efficiently and do not re parse the same input.

This is the core of packrat parsing, the technique that gives PEG parsers their performance guarantees.

IotaParser

IotaParser owns the input string and the memory table. Create one per parse:

parse returns Option<(usize, Vec<T>)>. The position after the match and the matched values, or None if the expression failed to match.