Athanasius Kircher, The Tower of Babel, 1679.

Babel Bridge

Memoizing Parsing Expression Grammar generator in Ruby

Parse Tree Class Structure

Once you are familiar with how to do basic parsing in Babel-Bridge, you'll want to do something with the results. The first step is understanding the parse tree class structure.

Defining a parser automatically generates several classes. For example:

class MyParser < BabelBridge::Parser
  rule :foo, "foo"
end

Generates:

MyParser < BabelBridge::Parser
MyParser::FooNode < BabelBridge::Node
MyParser::FooNode1 < MyParser::FooNode

FooNode was generated by the :foo rule. It inherits from the BabelBridge::Node class. FooNode1 represents the first (and only) varient of :foo. FooNode is never instantiated, but FooNode1 will be created whenever the first varient of :foo matches.

irb example:

>> MyParser.new.parse("foo").class
=> MyParser::FooNode1

You can examine the children of FooNode1 with the matches method:

>> MyParser.new.parse("foo").matches
=> ["foo"]
>> MyParser.new.parse("foo").matches[0].class
=> BabelBridge::TerminalNode

Let's do a more complex example. Below is a parser that recognizes any number of non-negative integers concatenated by pluses. Note that the :add rule has two variants which will create two variant sub-classes, AddNode1 and AddNode2, of the rule's parse-tree-node class AddNode.

class MyMathParser < BabelBridge::Parser
  rule :add, :number, "+", :add
  rule :add, :number
  rule :number, /[0-9]+/
end

puts MyMathParser.new.parse("34+12").inspect

Running the code above outputs:

MyMathParser::AddNode1
  MyMathParser::NumberNode1 > "34"
  "+"
  MyMathParser::AddNode2 > MyMathParser::NumberNode1 > "11"

If you inspect the classes of the child matches of the root AddNode1, you'll get:

>> MyMathParser.new.parse("34+12").matches.collect {|m|m.class}
=> [MyMathParser::NumberNode1,
    BabelBridge::TerminalNode,
    MyMathParser::AddNode2]

Every rule consists of one or more pattern elements which must match in order. The index of each pattern element directly corresponds to the index of it's parse-tree-node in the matches list.

There are several ways to access the children matches of a Node. All of the examples below return the parse-tree-node for the first number:

# returns the first matched pattern-element
MyMathParser.new.parse("34+12").matches[0]

# shortcut that also returns the first pattern-element
# '.matches' is optional
# Nodes implement Enumerable over their matches
MyMathParser.new.parse("34+12")[0]

# matched sub-rules can also be accessed by name
MyMathParser.new.parse("34+12").number

Adding Functionality to the Parse Tree

Manually walking the parse tree is nice and all, but things really start to get fun when we start adding some methods to the rule-varient parse-tree-nodes. This is done adding a ruby do-block to the end of a rule declaration. Inside this do-block you can add anything you want to that rule varient's class definition.

Example:

class MyMathParser < BabelBridge::Parser
  rule :add, :number, "+", :add do
    def result
      number.result + add.result
    end
  end

  rule :add, :number
  rule :number, /[0-9]+/ do
    def result
      to_s.to_i
    end
  end
end

puts MyMathParser.new.parse("34+12").result
# outputs "46"

There is a little bit of magic going on here. First, for the first varient of :add (AddNode1), we define a method "result". The result is just the sum of the results of the left and right-hand-sides of the add operator. We can access the sub-matched parse-tree-nodes by their rule names - in this case "number" and "add". Then we just recursively call "result" on them and add their return values.

The second bit of magic is in :number's "result" method, we call to_s on self. The to_s method on a Node just returns the string of characters that rule matched. In this case, a string of digits are returned and calling to_i on them gives us the integer value.

The last bit of magic is we never define a "result" method for the second varient of :add (AddNode2). By convention, if a Node doesn't know how to respond to a method, it forwards the method call to its first sub-match. In this case, calling "result" on AddNode2 automatically calls "result" on the sub-matched NumberNode1.