views:

279

answers:

3

Does anyone have code for finding a file that contains a regular expression? I would assume you could have two different flavors, one for BREs and one for EREs.

You would think some kind of test suites would have something like an isRegex() test. Can anyone have any code? Looking for something comprehensive of course.

I see this was discussed here but didn't see any practical responses. If I want to grep for any file that contains a regular expression, perhaps bounded by the typical //, how would I do it?

+1  A: 

Beyond

egrep '/.+/' file

you're looking at a really involved exercise.

chaos
+12  A: 

Regular expressions are themselves not a regular language. The clue is that they contain parentheses and square brackets and such that must be balanced.

A regular expression itself can be described by a context-free grammar, and parsed with a recursive-descent parser.

Bill Karwin
+1 for correctness.
Devin Jeanpierre
+1  A: 

If you are looking specifically for files that contain only or mostly regular expressions, then statistics should tell you that a certain file contains more of that syntax than others. So you could define a set of indicators, and combine their scores into a metric that scored a file on how likely it was to be of interest. Pick a cutoff and let it go. Some indicators:

  • Existence of more than one [0-9], [A-Z], + etc
  • Existence of /foo/
  • Not a standard code file
  • Less compressible (dodgy, I know, but the compactness of regex grammar would intuitively be harder to compress than normal words)
  • etc

But if this is just a one-shot, then you're probably best off using Chaos's answer and manually paring down the results. Is there anything in particular in the regex(es) you are looking for, that might be easier to pick up on?

Phil H

related questions