Athanasius Kircher, The Tower of Babel, 1679.

Babel Bridge

Memoizing Parsing Expression Grammar generator in Ruby

Rule Basics

Rules are declared inside parser classes using the following form:

rule [rule name], [pattern]

The rule's name is just a Ruby symbol. You can create variantes of a rule by listing more than one rule with the same name. The variants are tried in the order they were created.

For Example:

class MyParser < BabelBridge::Parser
  rule :foo, "foo"
  rule :foo, "bar"
end

Here the rule :foo has two variants. It can match either the string "foo" or the string "bar".

Pattern Basics

A pattern consists of one or more pattern elements. The simplest pattern element is just a Ruby string, which must matched exactly. If there is more than one pattern element in a rule, each must match in order. Patterns can be expressed in two ways:

class MyParser < BabelBridge::Parser
  rule :foo, "foo", "bar"
  rule :boo, ["foo", "bar"]
end

Above, rule :foo and rule :boo are equivalent. They both match "foo" followed by "bar": "foobar".

Pattern Elements

Below are the basic pattern elements:

"string"
A Ruby string is matched exactly.
/regexp/
Regular expressions are matched using Ruby Regexp.
:my_rule
Rule: Symbols match the named rule.
:my_rule?
Optional Rule: Symbols ending in "?" optionally match the rule.
:my_rule!
Not Rule: Symbols ending in "!" succeeds if the rule does not match.
true
True always matches the empty string. A handy no-op.

In addition to the basic pattern elements, you can construct more complicated patterns. Internally these are represented as a Hash, but the easy way to build these advanced patterns is with these chainable pattern-constructor methods:

In the examples below, pe is any basic pattern-element

match(pe)
Alone, this doesn't do anything interesting
match?(pe)
Optional: conditionally match pattern
match!(pe)
Not: succeed only if pattern does not match
conditionally.match(pe)
Optional: optionally match the pattern
equivalent to match?(pe)
dont.match(pe)
Not: succeed only if the pattern doesn't match
equivalent to match!(pe)
could.match(pe)
Could: succeeds only of the pattern is matched, but does not consume any input

Some examples:

class MyParser < BabelBridge::Parser
  rule :foo_a, many("foo")
  rule :foo_b, match?(:foo_a)
  rule :foo_c, match!(/foo/)
  rule :foo_d, could.match("foo")
end

The Many Pattern Element

The many pattern element has some extra conveniences that are possible but awkward to do with simple parsing expression grammars. Any of the many patterns can include a second, optional argument to specifiy a pattern to match for the delimiters between the primary element pattern.

many(pe)
Match the pattern one or more times
many?(pe)
Optional: Match the pattern zero or more times
many!(pe)
Not: succeed only if pattern does not match
many(pe1,pe2)
Match one or more of pe1 delimited by pe2

Examples:

class MyParser < BabelBridge::Parser
  rule :foo_a, many("foo"," ")
    # match one or more "foo"s delimited by spaces
    # Ex matches: "foo", "foo foo", "foo foo foo"
    # Ex non-matches: "foofoo"

  rule :foo_b, many("foo",match?(" "))
    # match one or more "foo"s optionally delimited by spaces
    # Ex matches: "foo", "foo foo", "foofoo", "foo foofoo"
    # Ex non-matches: "foo  foo"
end

Real World Example

Below is the complete code for the markup parser I used to convert ruby code into the pretty syntax-highlighted examples on this page.

Download: code_markup.rb

require "rubygems"
require "babel_bridge"

class CodeMarkup < BabelBridge::Parser
  rule :file, many(:element) do
    def markup
      "<pre><code>"+
      element.collect{|a| a.markup}.join.strip+
      "</code></pre>"
    end
  end

  rule :element, "<", :space do
    def markup; "<symbol><</symbol>#{space}" end
  end

  rule :element, ">", :space do
    def markup; "<symbol>></symbol>#{space}" end
  end

  rule :element, :comment, :space do
    def markup; "<comment>#{comment}</comment>#{space}" end
  end

  rule :element, :keyword, :space do
    def markup; "<keyword>#{keyword}</keyword>#{space}" end
  end

  rule :element, :string, :space do
    def markup
      str=string.to_s.gsub("<","<").gsub(">",">")
      "<string>#{str}</string>#{space}"
    end
  end

  rule :element, :regex, :space do
    def markup; "<regex>#{regex}</regex>#{space}" end
  end

  rule :element, :identifier, :space do
    def markup; "<identifier>#{identifier}</identifier>#{space}" end
  end

  rule :element, :symbol, :space do
    def markup; "<symbol>#{symbol}</symbol>#{space}" end
  end

  rule :element, :number, :space do
    def markup; "<number>#{number}</number>#{space}" end
  end

  rule :element, :non_space, :space do
    def markup; "#{non_space}#{space}" end
  end

  rule :space, /\s*/
  rule :number, /[0-9]+(\.[0-9]+)?/
  rule :comment, /#[^\n]*/
  rule :string, /"(\\.|[^\\"])*"/
  rule :string, /:[_a-zA-Z0-9]+[?!]?/
  rule :regex, /\/(\\.|[^\\\/])*\//
  rule :symbol, /[-!@\#$%^&*()_+={}|\[\];:<>\?,\.\/~]+/
  rule :keyword, /class|end|def|and|or|do|if|then/
  rule :keyword, /else|elsif|case|then|when|require/
  rule :identifier, /[_a-zA-Z][0-9_a-zA-Z]*/
  rule :non_space, /[^\s]+/
end

puts CodeMarkup.new.parse(File.read(ARGV[0])).markup