views:

60

answers:

4

I have a python script that reads from a config file. The config file is going to contain some user defined regex patterns. However, I was thinking I'd like to let the user use either full regex patterns, OR shell wildcards. So I should be able to interpret both

*.txt
as well as
.*\.txt$
correctly. So those 2 should be equivalent.

However I'd like to be able to do this without making the user tell me which they're using. Is this even possible? Maybe allowing full regex is overkill.

+2  A: 

You can't do this. What should prefix.* match? What about somefiles?? These have very different meanings in regex vs glob matching, but are common use cases in both.

Zac Thompson
A: 

Consider, for example, the pattern

foo?.txt

In glob-syntax, this will match foo1.txt, fooZ.txt but not fo.txt, fob.txt or fooZtxt In regexp syntaxt, this will match fo.txt, foQtxt, but not fooZ.txt

You can't, unambiguously, accept both syntaxes. The only option I can think of is have the user prefix the expression, i.e.

regexp:foo?.txt
Ivo van der Wijk
+1  A: 

One possible approach could be:

  1. Try to compile the given expression as a regex.

    a. If this fails (syntax error), use the expression as a glob string.

    b. If it doesn't fail to compile, use it as a regex.

  2. If it doesn't match anything, use it as a glob string.

In any case, tell the user what you did ("Interpreting pattern.* as a regular expression") and allow him to override whatever you have guessed. After all, as Zak Thompson wrote, some patterns will be both valid regexes and glob patterns.

Another thing to take into consideration is that a user can easily overload or crash your system with a regex through catastrophic backtracking. So unless it's your user's own machine, you might want to think about allowing regexes in the first place.

Tim Pietzcker
Yeah this will be running on the user's personal machine
Falmarri
A: 

Try not to leave the creation of regex to the user. The user should have an easier means to configure their files without needing to use regex. Eg let the users have a few choices,

  1. starts with
  2. ends with
  3. contains (OR and AND)
  4. etc

Then as the programmer, you use these choices to construct your regex.

ghostdog74
That's sort of why I'm trying to use both. I'm going to be using this program personally as well, in fact I'm mostly writing it for myself. So I'd like to support regex matches.
Falmarri