ansaurus

Question

is there need for a more declarative way of expressing regular expressions ? :)

Answer 1

+5 A:

This is actually pretty similar (identical?) to how a lexer/parser works. If you had a defined grammar then you could probably write a parser with not too much trouble. For instance, you could write something like this:

<expression> :: == <rule> | <rule> <expression> | <rule> " followed by " <expression>
<rule>       :: == <val> | <qty> <val>
<qty>        :: == "literal" | "one" | "one of" | "one or more of" | "zero or more of"
<val>        :: == "a" | "b" | "c" | "d" | ... | "Z" |

That's nowhere near a perfect description. For more info, take a look at this BNF of the regex language. You could then look at lexing and parsing the expression.

If you did it this way you could probably get a little closer to Natural Language/English versions of regexes.

I can see a tool like this being useful, but as was previously said, mainly for beginners. The main limitation to this approach would be in the amount of code you have to write to translate the language into regex (and/or vice versa). On the other hand, I think a two-way translation tool would actually be more ideal and see more use. Being able to take a regex and turn it into English might be a lot more helpful to spot errors.

Of course it doesn't take too long to pickup regex as the syntax is usually terse and most of the meanings are pretty self explanatory, at least if you use | or || as OR in your language, and you think of * as multiplying by 0-N, + as adding 0-N.

Though sometimes I wouldn't mind typing "find one or more 'a' followed by three digits or 'b' then 'c'"

Wayne Werner 2010-08-09 12:50:08

In reply to your `Being able to take a regex and turn it into English might be a lot more helpful to spot errors.`, try the `re.DEBUG` parameter with python in repl mode.

Daenyth 2010-08-09 21:21:57

@Daenyth - I'm aware of that mode, though I've not had cause to use it, and I can't say it's much better than the original regex, unless it's an extremely complicated regex.

Wayne Werner 2010-08-09 22:11:03

Answer 2

+4 A:

Please take a look at pyparsing. Many of the issues that you describe with RE's are the same ones that inspired me to write that package.

Here are some specific features of pyparsing from the O'Reilly e-book chapter "What's so special about pyparsing?".

Paul McGuire 2010-08-09 13:04:03

You beat me by a second! BTW, thanks for writing pyparsing :)

Roberto Bonvallet 2010-08-09 13:06:16

Answer 3

+2 A:

maybe not exactly what you are asking for, but there is a way how to write regexes more readable way (VERBOSE, shortly X flag):

rex_name = re.compile("""
    [A-Za-z]    # first letter
    [a-z]+      # the rest
""", re.X)

rex_name.match('Joe')

mykhal 2010-08-09 13:09:28

Answer 4

+1 A:

For developers trying to write regular expressions that are easy to grok and maintain, I wonder whether this sort of approach would offer anything that re.VERBOSE does not provide already.

For beginners, your idea might have some appeal. However, before you go down this path, you might try to mock up what your declarative syntax would look like for more complicated regular expressions using capturing groups, anchors, look-ahead assertions, and so forth. One challenge is that you might end up with a declarative syntax that is just as difficult to remember as the regex language itself.

You might also think about alternative ways to express things. For example, the first thought that occurred to me was to express a regex using functions with short, easy-to-remember names. For example:

from refunc import *

pattern = Compile(
    'a',
    Capture(
        Choices('b', 'c'),
        N_of( 'd', 1, Infin() ),
        N_of( 'e', 0, Infin() ),
    ),
    Look_ahead('foo'),
)

But when I see that in action, it looks like a pain to me. There are many aspects of regex that are quite intuitive -- for example, + to mean "one or more". One option would be a hybrid approach, allowing your user to mix those parts of regex that are already simple with functions for the more esoteric bits.

pattern = Compile(
    'a',
    Capture(
        '[bc]',
        'd+',
        'e*',
    ),
    Look_ahead('foo'),
)

I would add that in my experience, regular expressions are about leaning a thought process. Getting comfortable with the syntax is the easy part.

FM 2010-08-09 13:21:19

ansaurus

tags:

views:

answers:

is there need for a more declarative way of expressing regular expressions ? :)

related questions