Introductions
Iota is a PEG parser combinator library for Rust. You can build parsers by composing small, reusable expressions that map directly to grammar rules. There is no code generation, no macros, and no seperate grammar files.
Features
- Composable expressions with
+and|operators - Memoised parsing for guaranteed linear time on most grammars.
- Generioc over any AST type via the
ParseValuetrait - Helpers for common patterns
- Recursive grammars with
lazy()
Getting Started
Installation
[dependencies]
iota-parser = "0.1.0"
ParseValue Trait
This trait is extremely important for ensuring your types work with Iota. A parser is generic over a type T which implements ParseValue.
#![allow(unused)]
fn main() {
pub trait ParseValue: Clone {
fn from_str(s: String) -> Self;
fn as_str(&self) -> String;
}
}
For simple string parsing, String already implements ParseValue natively.
For your own AST types, you need to implement it yourself. from_str is called for every raw character or literal match, so it’s default needs to be sensible, and the real construction of your type happens inside action callbacks.
#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
struct JsonObject {
key: String,
value: String,
raw: String,
}
impl ParseValue for JsonObject {
fn from_str(s: String) -> Self {
JsonObject { key: String::new(), value: String::new(), raw: s }
}
fn as_str(&self) -> String {
self.raw.clone()
}
}
}
Example Parser
use iota::{IotaParser, action, one_or_more, digit};
fn number() -> Expression {
action(
one_or_more(digit),
|vals| vals.join(""),
)
}
fn main() {
let mut parser = IotaParser::::new("123".to_string());
match parser.parse(&number(), 0) {
Some((_, vals)) => println!("{:?}", vals[0]),
None => println!("Parse failed"),
}
}
Running Examples
Iota comes with some examples you can run directly so you can get a feel for it.
cargo run --example sexpr
cargo run --example json
sexpris a simple Lisp s-expression parser e.g.(+ 1 2)jsonis a simple key-value parser e.g.{"name": "blinx"}
Core Concepts
Expressions
Everything in Iota is an Expression<T>. Expressions are composable, you build complex parsers by combining simple ones.
An Expression is an enum under the hood, but you’ll never have to construct them yourself, which is why helpers exist in Iota.
Combining Expressions
Iota overloads + and | so your parser code reads like a grammar rule.
+ creates a sequence, both expressions must match in order:
#![allow(unused)]
fn main() {
let key_value = quoted_string() + lit(":") + quoted_string();
}
This matches:
"key":"value"
| creates an ordered choice, the left will be tried first, if that fails the right is tried.
#![allow(unused)]
fn main() {
let flag = lit("true") | lit("false");
}
This matches:
"true" or "false"
Actions
An action transforms the matched values of an expression into a single value, this is where your construct your AST nodes.
#![allow(unused)]
fn main() {
let number = action(
one_or_more(digit),
|vals| vals.join("")
);
}
This matches:
"123" will return "123" as a String
vals is a flattened vector of any matched sub-expressions. values() can be used to filter out any sentinels from ignore().
Ignoring Values
Some parts of your grammar may be structural, such as brackets, colons, commas etc. and you wouldn’t want them showing up in the vals vector. You can use ignore() to discard them.
#![allow(unused)]
fn main() {
let parens = ignore(ch('(')) + expr() + ignore(ch(')'));
}
Always use
values()in your action when usingignore()
#![allow(unused)]
fn main() {
...
|vals| {
let v = values(vals);
}
...
}
Recursion
PEG parsers like Iota are recursive descent, a rule can refer to itself. But in Rust, constructing a recursive expression tree eagerly causes infinite recursion, so we have to use lazy() to defer the construction until parse time.
#![allow(unused)]
fn main() {
fn expr() -> Expression {
let parenthesised = action(
ch('(') + lazy(expr) + ch(')'),
|vals| vals[1].clone(),
);
parenthesised | number()
}
}
lazy(expr) takes a function pointer, since expr is a function with no arguments it can be passed directly without a closure.
Memoisation
IotaParser maintains a memo table keyed on (express pointer, position). If the same expression is tried at the same position twice, the cached result is returned instead of a recomputation. This guarentees that even complex recursive grammars run efficiently and do not re parse the same input.
This is the core of packrat parsing, the technique that gives PEG parsers their performance guarantees.
IotaParser
IotaParser owns the input string and the memory table. Create one per parse:
parse returns Option<(usize, Vec<T>)>. The position after the match and the matched values, or None if the expression failed to match.