Babel Bridge
Rule Basics
Rules are declared inside parser classes using the following form:
rule [rule name], [pattern]
The rule's name is just a Ruby symbol. You can create variantes of a rule by listing more than one rule with the same name. The variants are tried in the order they were created.
For Example:
class MyParser < BabelBridge : :Parser
rule :foo , "foo"
rule :foo , "bar"
end
Here the rule :foo has two variants. It can match either the string "foo" or the string "bar".
Pattern Basics
A pattern consists of one or more pattern elements. The simplest pattern element is just a Ruby string, which must matched exactly. If there is more than one pattern element in a rule, each must match in order. Patterns can be expressed in two ways:
class MyParser < BabelBridge : :Parser
rule :foo , "foo" , "bar"
rule :boo , [ "foo" , "bar" ]
end
Above, rule :foo and rule :boo are equivalent. They both match "foo" followed by "bar": "foobar".
Pattern Elements
Below are the basic pattern elements:
"string"
A Ruby string is matched exactly./regexp/
Regular expressions are matched using Ruby Regexp.:my_rule
Rule: Symbols match the named rule.:my_rule?
Optional Rule: Symbols ending in "?" optionally match the rule.:my_rule!
Not Rule: Symbols ending in "!" succeeds if the rule does not match.true
True always matches the empty string. A handy no-op.
In addition to the basic pattern elements, you can construct more complicated patterns. Internally these are represented as a Hash, but the easy way to build these advanced patterns is with these chainable pattern-constructor methods:
In the examples below, pe
is any basic pattern-element
match(pe)
Alone, this doesn't do anything interestingmatch?(pe)
Optional: conditionally match patternmatch!(pe)
Not: succeed only if pattern does not matchconditionally.match(pe)
Optional: optionally match the pattern
equivalent tomatch?(pe)
dont.match(pe)
Not: succeed only if the pattern doesn't match
equivalent tomatch!(pe)
could.match(pe)
Could: succeeds only of the pattern is matched, but does not consume any input
Some examples:
class MyParser < BabelBridge : :Parser
rule :foo_a , many ( "foo" )
rule :foo_b , match ? ( :foo_a )
rule :foo_c , match ! ( /foo/ )
rule :foo_d , could . match ( "foo" )
end
The Many Pattern Element
The many
pattern element has some extra conveniences that are possible but awkward to do with simple
parsing expression grammars. Any of the many
patterns can include a second, optional argument to specifiy a pattern to match for the delimiters between the primary element pattern.
many(pe)
Match the pattern one or more timesmany?(pe)
Optional: Match the pattern zero or more timesmany!(pe)
Not: succeed only if pattern does not matchmany(pe1,pe2)
Match one or more of pe1 delimited by pe2
Examples:
class MyParser < BabelBridge : :Parser
rule :foo_a , many ( "foo" , " " )
# match one or more "foo"s delimited by spaces
# Ex matches: "foo", "foo foo", "foo foo foo"
# Ex non-matches: "foofoo"
rule :foo_b , many ( "foo" , match ? ( " " ) )
# match one or more "foo"s optionally delimited by spaces
# Ex matches: "foo", "foo foo", "foofoo", "foo foofoo"
# Ex non-matches: "foo foo"
end
Real World Example
Below is the complete code for the markup parser I used to convert ruby code into the pretty syntax-highlighted examples on this page.Download: code_markup.rb
require "rubygems"
require "babel_bridge"
class CodeMarkup < BabelBridge :: Parser
rule :file , many (: element ) do
def markup
"<pre><code>" +
element . collect {| a | a . markup }. join . strip +
"</code></pre>"
end
end
rule :element , "<" , :space do
def markup ; "<symbol><</symbol>#{space}" end
end
rule :element , ">" , :space do
def markup ; "<symbol>></symbol>#{space}" end
end
rule :element , :comment , :space do
def markup ; "<comment>#{comment}</comment>#{space}" end
end
rule :element , :keyword , :space do
def markup ; "<keyword>#{keyword}</keyword>#{space}" end
end
rule :element , :string , :space do
def markup
str = string . to_s . gsub ( "<" , "<" ). gsub ( ">" , ">" )
"<string>#{str}</string>#{space}"
end
end
rule :element , :regex , :space do
def markup ; "<regex>#{regex}</regex>#{space}" end
end
rule :element , :identifier , :space do
def markup ; "<identifier>#{identifier}</identifier>#{space}" end
end
rule :element , :symbol , :space do
def markup ; "<symbol>#{symbol}</symbol>#{space}" end
end
rule :element , :number , :space do
def markup ; "<number>#{number}</number>#{space}" end
end
rule :element , :non_space , :space do
def markup ; "#{non_space}#{space}" end
end
rule :space , /\s*/
rule :number , /[0-9]+(\.[0-9]+)?/
rule :comment , /#[^\n]*/
rule :string , /"(\\.|[^\\"])*"/
rule :string , /:[_a-zA-Z0-9]+[?!]?/
rule :regex , /\/(\\.|[^\\\/])*\//
rule :symbol , /[-!@\#$%^&*()_+={}|\[\];:<>\?,\.\/~]+/
rule :keyword , /class|end|def|and|or|do|if|then/
rule :keyword , /else|elsif|case|then|when|require/
rule :identifier , /[_a-zA-Z][0-9_a-zA-Z]*/
rule :non_space , /[^\s]+/
end
puts CodeMarkup . new . parse ( File . read ( ARGV [ 0 ])). markup