I'm tossing around ideas about how to implement a parser for the language I described in the previous post, where matrices are represented by vertically lined up square brackets, like
A more complicated example: let L be the set of all strings over the alphabet {U, X, O, \n} where every O is enclosed in a "ring" of X's. This means, if you trace vertically or horizontally from the O, you will find an X before you find a U or a newline or the start/end of the string. Example of a string in this language:
H -> BH | E
E -> AE | CE | e
and using the verticalproductions
V -> G | J
G -> BG | e
J -> AJ | CJ | e
where e represents epsilon (the empty string). The intention is that if every horizontal row can be produced using the horizontal productions, and every vertical row can be produced (independently) using the vertical productions, then the 2-d string is in the language.
I can generate the {X, O, U} language using the horizontal productions
H -> UH | HXEXH | e
E -> XE | OE | e
and using the vertical productions
V -> UV | VXGXV | e
G -> XG | OG | e
(note that in this case, the vertical productions happen to be the same as the horizontal ones, up to renaming variables).
A more interesting 2-d language would be the set of 2-d strings over {X, O} where the X's form a single connected region. I don't think this can be recognized using vertical and horizontal productions, but such functionality might let you do things such as trace the edges of arbitrary shapes. What extension might we use to recognize such a language?
A different way of looking at production rules is allowing the right hand side of a production to itself be a 2d string. For example,
E ->
Here E can produce a 2 dimensional "+" shaped string, or it can produce an empty string. The difficulty is that with 2-d strings, it's not always clear how to expand a nonterminal that is already surrounded by other symbols. For example, suppose we have the string
[a b c d]So this got me thinking about 2-d grammars in general. Define a 2-dimensional string to be a matrix with entries chosen from some alphabet. For example, one language is the set of all 2-dimensional strings over the alphabet {A, B, C} where each row starts with the same number of B's as every other row, and then ends in some number of A's and C's. If we wrote out the rows and separated them by some newline character, would the resulting 1-dimensional string be context free? Strings in this language are like
[e f g h]
[i j k l]
BBAACbut not like
BBCAA
BBCCA
BAACIt's very easy to parse algorithmically and it does satisfy the pumping lemma for context free grammars but it's difficult to write an actual context free grammar for it.
BBBA
BBCA
A more complicated example: let L be the set of all strings over the alphabet {U, X, O, \n} where every O is enclosed in a "ring" of X's. This means, if you trace vertically or horizontally from the O, you will find an X before you find a U or a newline or the start/end of the string. Example of a string in this language:
UUUUUUUUand a string not in this language:
UXXXUXXX
XOOXXXOX
XOOXOOOX
XOOOOOOX
UXXXXXXX
UUUXXXUX
UUUUAn interesting fact about both of those previous languages is that if you consider their rows and columns independently, each is a context free language. I can generate the {B, A, C} language using the horizontal productions
UXUX
UXOX
UXXX
H -> BH | E
E -> AE | CE | e
and using the verticalproductions
V -> G | J
G -> BG | e
J -> AJ | CJ | e
where e represents epsilon (the empty string). The intention is that if every horizontal row can be produced using the horizontal productions, and every vertical row can be produced (independently) using the vertical productions, then the 2-d string is in the language.
I can generate the {X, O, U} language using the horizontal productions
H -> UH | HXEXH | e
E -> XE | OE | e
and using the vertical productions
V -> UV | VXGXV | e
G -> XG | OG | e
(note that in this case, the vertical productions happen to be the same as the horizontal ones, up to renaming variables).
A more interesting 2-d language would be the set of 2-d strings over {X, O} where the X's form a single connected region. I don't think this can be recognized using vertical and horizontal productions, but such functionality might let you do things such as trace the edges of arbitrary shapes. What extension might we use to recognize such a language?
A different way of looking at production rules is allowing the right hand side of a production to itself be a 2d string. For example,
E ->
A| e
AEA
A
Here E can produce a 2 dimensional "+" shaped string, or it can produce an empty string. The difficulty is that with 2-d strings, it's not always clear how to expand a nonterminal that is already surrounded by other symbols. For example, suppose we have the string
ABCand we want to expand E according to the first production above. What should we get? Here are some possibilities:
DEF
GHI
-
ABC
A DAEAF
A GHI -
B
A A C
DAEAF
G A I
H -
B
AAC
DAEAF
GAI
H -
A B C
A DAEAF
A G H I
© Bart Parkis
Last modified:Tue May 19 19:07:51 2009