views:

81

answers:

4

I have a config file that the user can specify sections, and then within those section they can specify regular expressions. I have to parse this config file and separate the regex's into the various sections.

Is there an easy way to delimitate a regex from a section header? I was thinking just the standard

[section]
regex1
regex2

But I just realized that [section] is a valid regex. So I'm wondering if there's a way I can format a section header so that it can ONLY be understood as a section header and not a regex.

+4  A: 

There's an unlimited ways of making an invalid regexp, but the first thing that comes to mind would be

*section*

You can't have a quantifier (*) at the start of the regexp.

(The other * is there just to satisfy my obsession for symmetry.)

Matti Virkkunen
Nice simple solution. `*` is better anyway since I can just `.strip('*')` it away instead of having to add the albeit minorly more complicated `.strip('[]')`
Falmarri
Please don't make up your own weird little configuration format with such a bizarre design as "uses * at the start of the line to distinguish from regular expressions". Use XML or JSON or Lua; don't litter the world with yet another unnecessary custom file format.
Glenn Maynard
So I should rather use an awkward xml format when what I'm trying to do can be done with a simple line
Falmarri
A: 

There are easy ways, but they all require changing your format:

  1. Use indentation, similar to how Python source is interpreted. Leading spaces would need special handling, e.g. "(?: )abc" instead of " abc".
  2. Use an INI format, where each item in a section requires a name=value pair.
  3. Use some sort of list syntax. ast.literal_eval will be helpful.

    section1 = [
      "regex 1",
      "2",
      "3",
    ]
    section2 = ["..."]
    

Primarily, don't invent your own format, or make it as close to a known format as you can. The third is a subset of Python syntax, for example, and you could even use raw string literals naturally.

JSON or YAML may be useful for you.

Roger Pate
+1  A: 

I don't know your problem domain, so I don't know what forms of regex you're expecting, but it seems to me you should keep your section formatting as it is. A regex that starts with [ and ends with ] and has no square brackets in between is quite unusual. It can only match a single character. So leave the section headers as they are. Strictly speaking, they are valid regexes, but they probably aren't interesting regexes.

Also, why not use ConfigParser from the standard library, and let it do the parsing for you?

Ned Batchelder
I thought about using ConfigParser except my options are key,value pairs. And while I could work around that, I need to maintain the order of the options under the section. I realize there are python3.0 solutions using ordered_dict, but I need it to work under 2.6.5. And it's not THAT complex that it's worth having to use custom or futures modules when I can just write it myself.
Falmarri
A: 

As others have said, please don't invent yet another config format. Use the Python Standard Library's ConfigParser, which will be able to parse the [section] notation exactly as you have shown it.

EDIT: The allow_no_value option allows you to to just have a single entry, rather than a key/value pair. And the default dict type is OrderedDict, so it will maintain order.

Mark Thomas