views:

3401

answers:

6

I have a python scripts that analyzes a set of error messages and checks for each message if it matches a certain pattern (regular expression) in order to group these messages. For example "file x does not exist" and "file y does not exist" would match "file .* does not exist" and be accounted as two occurrences of "file not found" category.
As the number of patterns and categories is growing, I'd like to put these couples "regular expression/display string" in a configuration file, basically a dictionary serialization of some sort.
I would like this file to be editable by hand, so I'm discarding any form of binary serialization, and also I'd rather not resort to xml serialization to avoid problems with characters to escape (& <> and so on...).
Do you have any idea of what could be a good way of accomplishing this?

Update: thanks to Daren Thomas and Federico Ramponi, but I cannot have an external python file with possibly arbitrary code.

+3  A: 

I think you want the ConfigParser module in the standard library. It reads and writes INI style files. The examples and documentation in the standard documentation I've linked to are very comprehensive.

Andrew Wilkinson
+5  A: 

I've heard that ConfigObj is easier to work with than ConfigParser. It is used by a lot of big projects, IPython, Trac, Turbogears, etc...

From their introduction:

ConfigObj is a simple but powerful config file reader and writer: an ini file round tripper. Its main feature is that it is very easy to use, with a straightforward programmer's interface and a simple syntax for config files. It has lots of other features though :

  • Nested sections (subsections), to any level
  • List values
  • Multiple line values
  • String interpolation (substitution)
  • Integrated with a powerful validation system
    • including automatic type checking/conversion
    • repeated sections
    • and allowing default values
  • When writing out config files, ConfigObj preserves all comments and the order of members and sections
  • Many useful methods and options for working with configuration files (like the 'reload' method)
  • Full Unicode support
sheats
Looks great, thanks.
Ölbaum
+4  A: 
Federico Ramponi
This works great. The syntax is easy to write and validate. The "what if some puts dangerous code in the config file?" question is crazy. If you have doubts, TALK to the people who have access to the config.
S.Lott
Why bother with the eval? Just make your config file a python script and import it.
davidavr
Yes yes, after reading the thread, now I think I would stick with the "import script" solution...
Federico Ramponi
+4  A: 

I sometimes just write a python module (i.e. file) called config.py or something with following contents:

config = {
    'name': 'hello',
    'see?': 'world'
}

this can then be 'read' like so:

from config import config
config['name']
config['see?']

easy.

Daren Thomas
Good, but the word "dictionary" in the question is somewhat misleading. A regexp cannot work as a key, what he needs is a list of couples...
Federico Ramponi
@Federico: the regex pattern can be a key:patterns = { 'file .* does not exist': 'file not found', 'user .* not found': 'authorization error',}
davidavr
Ok davraamides, but how are you supposed to use it? I would prefer something like: for (regexp,m) in messages: if regexp.match(string): print m
Federico Ramponi
@Federico: very similar to your for example: for pattern, msg in config.patterns: if re.search(pattern, string): print msg
davidavr
+1  A: 

I typically do as Daren suggested, just make your config file a Python script:

patterns = {
    'file .* does not exist': 'file not found',
    'user .* not found': 'authorization error',
}

Then you can use it as:

import config

for pattern in config.patterns:
    if re.search(pattern, log_message):
        print config.patterns[pattern]

This is what Django does with their settings file, by the way.

davidavr
+10  A: 

You have two decent options:

  1. Python standard config file format using ConfigParser
  2. YAML using a library like PyYAML

The standard Python configuration files look like INI files with [sections] and key : value or key = value pairs. The advantages to this format are:

  • No third-party libraries necessary
  • Simple, familiar file format.

YAML is different in that it is designed to be a human friendly data serialization format rather than specifically designed for configuration. It is very readable and gives you a couple different ways to represent the same data. For your problem, you could create a YAML file that looks like this:

file .* does not exist : file not found
user .* not found : authorization error

Or like this:

{ file .* does not exist: file not found,
  user .* not found: authorization error }

Using PyYAML couldn't be simpler:

import yaml

errors = yaml.load(open('my.yaml'))

At this point errors is a Python dictionary with the expected format. YAML is capable of representing more than dictionaries: if you prefer a list of pairs, use this format:

-
  - file .* does not exist 
  - file not found
-
  - user .* not found
  - authorization error

Or

[ [file .* does not exist, file not found],
  [user .* not found, authorization error]]

Which will produce a list of lists when yaml.load is called.

One advantage of YAML is that you could use it to export your existing, hard-coded data out to a file to create the initial version, rather than cut/paste plus a bunch of find/replace to get the data into the right format.

The YAML format will take a little more time to get familiar with, but using PyYAML is even simpler than using ConfigParser with the advantage is that you have more options regarding how your data is represented using YAML.

Either one sounds like it will fit your current needs, ConfigParser will be easier to start with while YAML gives you more flexibilty in the future, if your needs expand.

Best of luck!

Aaron Hays
What's wrong with JSON as a notation for configuration files?
S.Lott
JSON should work too, I'm just more familiar with YAML. Of course, 2.6 has the new json module, which is convenient.
Aaron Hays