views:

382

answers:

5

I'm writing my second python script to try and parse the contents of a config file and would like some noob advice. I'm not sure if its best to use regex to parse my script since its multiple lines? I've also been reading about dictionaries and wondered if this would be good practice. I'm not necessarily looking for the code just a push in the right direction.

Example: My config file looks like this.

Job {
  Name = "host.domain.com-foo"
  Client = host.domain.com-fd
  JobDefs = "DefaultJob"
  FileSet = "local"
  Write Bootstrap = "/etc/foo/host.domain.com-foo.bsr"
  Pool = storage-disk1
  }

Should I used regex, line splitting or maybe a module? If I had multiple jobs in my config file would I use a dictionary to correlate a job to a pool?

+2  A: 

I don't think a regex is adequate for parsing something like this. You could look at a true parser, such as pyparsing. Or if the file format is within your control, you might consider XML. There are standard Python libraries for parsing that.

Fred Larson
+5  A: 

There are numorous existing alternatives for this task, json, pickle and yaml to name 3. Unless you really want to implement this yourself, you should use one of these. Even if you do roll your own, following the format of one of the above is still a good idea.

Also, it's a much better idea to use a parser/generator or similar tool to do the parsing, regex's are going to be harder to maintain and more inefficient for this type of task.

Dana the Sane
+4  A: 

ConfigParser module from the standard library is probably the most Pythonic and staight-forward way to parse a configuration file that your python script is using.

If you are restricted to using the particular format you have outlined, then using pyparsing is pretty good.

John Mulder
+8  A: 

If you can change the configuration file format, you can directly write your file as a Python file.

config.py

job = {
  'Name' : "host.domain.com-foo",
  'Client' : "host.domain.com-fd",
  'JobDefs' : "DefaultJob",
  'FileSet' : "local",
  'Write Bootstrap' : "/etc/foo/host.domain.com-foo.bsr",
  'Pool' : 'storage-disk1'
}

yourscript.py

from config import job

print job['Name']
NicDumZ
That's often very dangerous for security because simply reading the config file may run arbitrary actions.
bortzmeyer
+1: Excellent. There are not security problems with this unless someone who's certifiably crazy decides to write insane parameter values.
S.Lott
And if you can't change the format but you trust he file, you can read, substitute : for = and eval()
ilya n.
But you cannot always trust the file. Many programs read the file with a relative name ("config/.py") and therefore are vulnerable when executed while in a directory like /tmp where anyone can put such a file.
bortzmeyer
@bortzmeyer: since only job is imported, there is little risk. If you care about someone maliciously changing the values to point to a wrong resource, then it can happen for any other config file. If you're worried about code being executed when a dict is loaded, mmm well... what possibilities do you have? A 'job' that subclasses dict and overrides __getitem__ to execute some code? Pretty tricky. And security is not mentionned by the OP.
NicDumZ
+4  A: 

If your config file can be turned into a python file, just make it a dictionary and import the module.

Job = { "Name" : "host.domain.com-foo",
        "Client" : "host.domain.com-fd",
        "JobDefs" : "DefaultJob",
        "FileSet" : "local",
        "Write BootStrap" : "/etc/foo/host.domain.com-foo.bsr",
        "Pool" : "storage-disk1" }

You can access the options by simply calling Job["Name"]..etc.

The ConfigParser is easy to use as well. You can create a text file that looks like this:

[Job]
Name=host.domain.com-foo
Client=host.domain.com-fd
JobDefs=DefaultJob
FileSet=local
Write BootStrap=/etc/foo/host.domain.com-foo.bsr
Pool=storage-disk1

Just keep it simple like one of the above.

DoxaLogos
+1: Keep it simple. Reuse existing solutions. Invent as little as possible. Words to live by.
S.Lott