ansaurus

Question

Idea on parsing character syntax diagram.

Answer 1

+2 A:

I've never done anything like that before, but this is how I would approach it.

First, I'd start off with something like this:

class CharGrid(object):
    def __init__(self, text):
        self.lines = text.split('\n')

    def __getitem__(self, pos):
        try:
            col, row = pos
        except (TypeError, ValueError):
            raise KeyError('%r not a 2-tuple' % (pos,))
        if row >= len(self.lines):
            return ' '
        line = self.lines[row]
        if col >= len(line):
            return ' '
        return line[col]

so that I can access the characters in the text via 2D coordinates:

grid = CharGrid("""Creating a No-Conversion Job

>>-onpladm create job--job--+--------------+-- -n--------------->
                            '- -p--project-'

>-- -d--device-- -D--database-- -t--table----------------------->

   .---------------------------------------------------------------------.
   V                                                                     |
>----+-----------------------------------------------------------------+-+-><
     |                                                            (1)  |
     '-+-------------+--+-------------+--| Setting the Run Mode |------'
       '- -S--server-'  '- -T--target-'
""")

print ''.join((grid[0,0], grid[1,0], grid[2,0]))
print ''.join((grid[0,2], grid[1,2]))

(yielding)

Cre
>>

After that, the task would be converting the 2D grid of characters into a 1D sequence of symbols:

read the label off the first line
scan down the first column until you find >>
scan right from the current position until you find [whatever]

... etc. Follow the chart in eyeball-order.

Once you have a 1D sequence of symbols, you can use a conventional parsing technique on that.

Matt Anderson 2010-06-10 17:02:00

Matt, thanks for the pretty neat code, there are a few things I still need to consider like being sensible to optional / mandatory parameter and determination of parameter name / value, you class is a good start :)

Ripley 2010-06-11 06:26:02

Answer 2

+1 A:

The "character grid" idea for accessing single characters seems like a foundation step; another answer shows how to do that just fine. Now you can access the grid randomly and follow horizontal or vertical lines easily.

The real problem is that you want to construct a graph representing what the character grid says. Such a graph will consist of (duh), nodes, arcs, and annotations.

Probably the easiest thing to find are the nodes, which are likely indicated (see other answer) by characters representing branching points in the diagram (e.g. +). Each arc will be a string of characaters leading to a bend in the arc, or to another node. Following such strings of characters should be pretty straighforward (:-) ) and can produce a string representing the arc even if it has bends in it.

You'll likely want to enumerate all nodes (just scan the array). Node annotations must reasonably be nearby and you can simply scan a small radious around the node locations.

You'll want to enumerate each arc leaving a node, and collect the string representing the arc.

I'd feed the arc-string to a lexer to tear it apart; it may have interesting content (e.g., an annotation as in inline sequence of characters).

At this point you have nodes and arcs with associated annotations. Constructing the corresponding graph from these should be pretty easy.

Ira Baxter 2010-06-10 19:52:27

Ira, your approach makes perfect sense, while something else seems need to be considered as well. Like some arcs are not topological movable (they imply the optional or mandatory parameters), and due to this the parser have to be sensible if the current node is on the main trunk or a side branch, or how deep it's been nested. My feeling is traversal the nodes from left to right will require fewer efforts, coding to figure these out ...

Ripley 2010-06-11 06:41:57

"Some details left as an exercise for the reader" :-}

Ira Baxter 2010-06-11 09:53:26

ansaurus

tags:

views:

answers:

Idea on parsing character syntax diagram.

related questions