Athanasius Kircher, The Tower of Babel, 1679.

Babel Bridge

Rule Basics

Rules are declared inside parser classes using the following form:

rule [rule name], [pattern]

The rule's name is just a Ruby symbol. You can create variantes of a rule by listing more than one rule with the same name. The variants are tried in the order they were created.

For Example:

class MyParser < BabelBridge::Parser
  rule :foo, "foo"
  rule :foo, "bar"
end

Here the rule :foo has two variants. It can match either the string "foo" or the string "bar".

Pattern Basics

A pattern consists of one or more pattern elements. The simplest pattern element is just a Ruby string, which must matched exactly. If there is more than one pattern element in a rule, each must match in order. Patterns can be expressed in two ways:

class MyParser < BabelBridge::Parser
  rule :foo, "foo", "bar"
  rule :boo, ["foo", "bar"]
end

Above, rule :foo and rule :boo are equivalent. They both match "foo" followed by "bar": "foobar".

Pattern Elements

Below are the basic pattern elements:

In addition to the basic pattern elements, you can construct more complicated patterns. Internally these are represented as a Hash, but the easy way to build these advanced patterns is with these chainable pattern-constructor methods:

In the examples below, pe is any basic pattern-element

Some examples:

class MyParser < BabelBridge::Parser
  rule :foo_a, many("foo")
  rule :foo_b, match?(:foo_a)
  rule :foo_c, match!(/foo/)
  rule :foo_d, could.match("foo")
end

The Many Pattern Element

The many pattern element has some extra conveniences that are possible but awkward to do with simple parsing expression grammars. Any of the many patterns can include a second, optional argument to specifiy a pattern to match for the delimiters between the primary element pattern.

Examples:

class MyParser < BabelBridge::Parser
  rule :foo_a, many("foo"," ")
    # match one or more "foo"s delimited by spaces
    # Ex matches: "foo", "foo foo", "foo foo foo"
    # Ex non-matches: "foofoo"

  rule :foo_b, many("foo",match?(" "))
    # match one or more "foo"s optionally delimited by spaces
    # Ex matches: "foo", "foo foo", "foofoo", "foo foofoo"
    # Ex non-matches: "foo  foo"
end

Real World Example

Below is the complete code for the markup parser I used to convert ruby code into the pretty syntax-highlighted examples on this page.

Download: code_markup.rb

require "rubygems"
require "babel_bridge"

class CodeMarkup < BabelBridge::Parser
  rule :file, many(:element) do
    def markup
      "<pre><code>"+
      element.collect{|a| a.markup}.join.strip+
      "</code></pre>"
    end
  end

  rule :element, "<", :space do
    def markup; "<symbol><</symbol>#{space}" end
  end

  rule :element, ">", :space do
    def markup; "<symbol>></symbol>#{space}" end
  end

  rule :element, :comment, :space do
    def markup; "<comment>#{comment}</comment>#{space}" end
  end

  rule :element, :keyword, :space do
    def markup; "<keyword>#{keyword}</keyword>#{space}" end
  end

  rule :element, :string, :space do
    def markup
      str=string.to_s.gsub("<","<").gsub(">",">")
      "<string>#{str}</string>#{space}"
    end
  end

  rule :element, :regex, :space do
    def markup; "<regex>#{regex}</regex>#{space}" end
  end

  rule :element, :identifier, :space do
    def markup; "<identifier>#{identifier}</identifier>#{space}" end
  end

  rule :element, :symbol, :space do
    def markup; "<symbol>#{symbol}</symbol>#{space}" end
  end

  rule :element, :number, :space do
    def markup; "<number>#{number}</number>#{space}" end
  end

  rule :element, :non_space, :space do
    def markup; "#{non_space}#{space}" end
  end

  rule :space, /\s*/
  rule :number, /[0-9]+(\.[0-9]+)?/
  rule :comment, /#[^\n]*/
  rule :string, /"(\\.|[^\\"])*"/
  rule :string, /:[_a-zA-Z0-9]+[?!]?/
  rule :regex, /\/(\\.|[^\\\/])*\//
  rule :symbol, /[-!@\#$%^&*()_+={}|\[\];:<>\?,\.\/~]+/
  rule :keyword, /class|end|def|and|or|do|if|then/
  rule :keyword, /else|elsif|case|then|when|require/
  rule :identifier, /[_a-zA-Z][0-9_a-zA-Z]*/
  rule :non_space, /[^\s]+/
end

puts CodeMarkup.new.parse(File.read(ARGV[0])).markup