views:

343

answers:

4

Are there any good (or at least interesting but flawed) analogs to regular expressions in two dimensions?

In one dimension I can write something like /aaac?(bc)*b?aaa/ to quickly pull out a region of alternating bs and cs with a border of at least three as. Perhaps as importantly, I can come back a month later and see at a glance what it's looking for.

I'm finding myself writing custom code for analogous problems in 2d (some much more complicated / constrained) and it would be nice to have a more concise and standardized notation, even if I have to write the engine behind it myself.

A second example might be called "find the +". The goal is to locate a column of 3 or more as, a b bracketed by 3 or more as with three or more as below. It should match:

..7...hkj.k f
7...a  h o j 
----a--------
 j .a,g- 8 9 
.aaabaaaaa7 j
 k .a,,g- h j
 hh a----?  j
    a   hjg

and might be written as [b^(a{3})v(a{3})>(a{3})<(a{3})] or...

Suggestions?

+2  A: 

Regular expressions are designed to model patterns in one dimension. As I understand it, you want to match patterns in a two dimensional array of characters.

Can you use regular expressions? That depends on whether the property that you are searching for is decomposable into components which can be matched in one dimension or not. If so, you can run your regular expressions on columns and rows, and look for intersections in the solution sets from those. Depending on the particular problem you have, this may either solve the problem, or it may find areas in your 2d array which are likely to be solutions.

Whether your problem is decomposable or not, I think writing some custom code will be inevitable. But at least it sounds like an interesting problem, so the work should be pleasant.

Rob Lachlan
+2  A: 

You're essentially talking about a spatial query language. There are plenty out there if you look up spatial query, geographic query and graphic querying. The spatial part generally comes down to points, lines and objects in a region that have other given attributes. Regions can be specified as polygons, distance from a point (e.g. circles), distance from a linear feature such as a road, all points on one side of a linear feature, etc... You can then get into more complex queries like set of all nearest neighbours, shortest path, travelling salesman, and tesselations like Delaunay TINs and Voronoi diagrams.

Shane MacLaughlin
While interesting, that doesn't seem to be quite what I'm looking for. What I've found seems to be about geometric properties in a continuum, while I'm more interested in structural features on a lattice.
MarkusQ
+2  A: 

Nice problem.

First, ask yourself if you can constrain the pattern to a "+" pattern, or if you would it need to test/match rectangles also. For instance, a rectangle of [bc] with a border of a would match the center rectangle below, but would also match a "+" shape of [c([bc]*a})v([bc]*a)>([bc]*a)<([bc]*a)] (in your syntax)

xxxaaaaaxxx
yzyabcba12d
defabcbass3
oreabcba3s3
s33aaaaas33
k388x.egiee

If you can constrain it to a "+" then your task is much easier. As ShuggyCoUk said, parsing a RE is usually equivalent to a DFSM -- but for a single, serial input which simplifies things greatly.

In your "RE+" engine, you'll have to debug not only the engine, but also each place that the expressions are entered. I'd like the compiler to catch any errors it could. Given that, you could also use something that was explicitly four RE's, like:

REPlus engine = new REPlus('b').North("a{3}")
   .East("a{3}").South("a{3}").West("a{3}");

However, depending on your implementation this may be cumbersome.

And with regard to the traversal engine -- would the North/West patterns match RtL or LtR? It could matter if the patterns match differently with or w/o greedy sub-matches.

Incidentally, I think the '^' in your example is supposed to be one character to the left, outside the parenthesis.

NVRAM
Fixed the `^` typo, thanks. The "+" example is just one--the border and pattern fill is another. Also I've got something that finds closed circuits with restricted path widths, various "bracketing" structures, and probably a few others (once you recognize a tool you often find unexpected uses).
MarkusQ
+6  A: 

Not being a regex expert, but finding the problem interesting, I looked around and found this interesting blog entry. Especially the syntax used there for defining the 2D regex looks appealing. The paper linked there might tell you more than me.

MicSim
Now why didn't google turn that up for me? Oh well, no worries. This sounds like exactly what I was looking for, thanks!
MarkusQ
After browsing through the paper I'm declaring this the solution. If anyone is interested, the paper behind the blog entry is linked from the primary author's page here http://www.mat.uniroma2.it/~giammarr/Research/pubbl.html without needing to pay a publisher for the privilege of downloading it.
MarkusQ