views:

168

answers:

6

I'm new to python and need help with a problem. Basically I need to open a file and read it which I can do no problem. The problem arises at line 0, where I need to check the header format.

The header needs to be in the format: p wncf nvar nclauses hard where 'nvar' 'nclauses' and 'hard' are all positive integers.

For example:

p wncf 1563 817439 186191

would be a valid header line.

Here is coding i have already thanks to a question people answered earlier:

import re 
filename = raw_input('Please enter the name of the WNCF file: ') 
f = open(filename, 'r') 

for line in f: 
    p = re.compile('p wncf \d+ \d+ \d+$') 
    if p.match(line[0]) == None: 
        print "incorrect format"

I still get an incorrect format even when the file is of a correct format. Also, would it be possible to assign the integers to an object?

Thanks in advance.

+4  A: 

something like that (lines is a list of all the lines in order):

import re
if re.match(r'p wncf \d+ \d+ \d+', lines[0]) == None:
    print "Bad format"
RC
With a warning that this re assumes that all of the fields are seperated by exactly one space...
mkClark
@mkClark, that was an assumption, thanks for the precision :)
RC
Although it's safe here, comparing for equality with None is not a good idea in general. One should always use "is None" or "is not None" (using identity comparison) instead. (It's also faster.)
Peter Hansen
@Peter, I wasn't aware of the "is None" comparison, I will look into that in the doc, thx
RC
+1  A: 
p, wncf, nvar, nclauses, hard = line.split()
nvar = int(nvar)
nclauses = int(nclauses)
hard = int(hard)
jcdyer
How would this be implemented? because i'm assuming the numbers in the header would be assigned to its corresponding object?
harpalss
This code doesn't protect against the integers being negative, it doesn't catch the exceptions that will get thrown if the format doesn't match and neither does it protect against there being extraneous information at the end of the header line. In short this code fragment only works if the header is well formed.
Andrew O'Reilly
True. It should check that numbers are negative, but I don't think it should protect against exceptions. I would think it's the responsibility of this code to provide exceptions. Maybe it should catch the type errors raised if the values aren't ints and reraise a ValueError. If nvar, nclauses, and hard are not positive ints, I think raising a ValueError is the right thing to do. If the OP wants something else, he can catch it in the receiving code.
jcdyer
A: 

Using regular expressions would be about the easiest way to check this header:-

import re
p = re.compile('p wncf \d+ \d+ \d+$')
if p.match(lineToBeChecked) == None:
  print "Header does not have correct format"

Note the use of the trailing $ in the regex to anchor the regex to the end of the line and so protect against additional information being included on the header line (which I've assumed would make it invalid).

If arbitrary numbers of spaces are allowed between parameters the regular expression could be changed to this:-

p = re.compile('p[ ]+wncf[ ]+\d+[ ]+\d+[ ]+\d+$')
Andrew O'Reilly
Here my script so far, but im still getting an incorrect format when the file format is correct. You were right in assuming one space between the parameters.import refilename = raw_input('Please enter the name of the WNCF file: ')f = open(filename, 'r')for line in f: p = re.compile('p wncf \d+ \d+ \d+$') if p.match(line[0]) == None: print "incorrect format"Also would it be possible to assign objects to the integers?thanks!
harpalss
sorry i assumed the coding would print out as the same format i typed it in.
harpalss
With this code you are only passing the first character of line to the regular expression matcher, which will obviously always fail. Change 'if p.match(line[0]) == None:' to 'if p.match(line) == None:'
Andrew O'Reilly
regex are not always the easiest.
+6  A: 

Alright, a few things.

  1. You only need to compile your regular expression once. In the example you gave above, you're recompiling it for every line in the file.

  2. line[0] is just the first character in each line. Replace line[0] with line and your code should work.

To assign the integers to an object, you have to surround the groups you want in parentheses. In your case, let

p = re.compile(r"p wncf (\d+) (\d+) (\d+)")

And instead of p.match(line), which returns a match object or None, you could use findall. Check out the following as a replacement for what you have.

p = re.compile(r"p wncf (\d+) (\d+) (\d+)") 
for line in f: 
    matches = p.findall(line)
    if len(matches) != 0:
        print matches[0][0], matches[0][1], matches[0][2]
    else:
        print "No matches."

Edit: If your header values can contain negative numbers as well, you should replace r"p wncf (\d+) (\d+) (\d+)" with r"p wncf (-?\d+) (-?\d+) (-?\d+)".

Dan Loewenherz
hey thanks for the help, i typed the code in exactly as you did and also made the changes you recommended but i still get the 'no matches' print?
harpalss
That's weird. It works for me. Can you post the first few lines of the file you're reading?
Dan Loewenherz
`p wncf 1569 817439 186191`thats all thats in the file for the moment.
harpalss
Which Python version are you using? If you don't know, run `python -V` at the command line.
Dan Loewenherz
er.........its 2.6
harpalss
@Dan, note that the re compilation results (first 100 or so) are actually cached, so for most programs it's not really a problem to just call re.match() like that, rather than re.compile().
Peter Hansen
@Peter, that's pretty cool, didn't realize that. Thanks for the heads up!
Dan Loewenherz
@Dan: `if len(matches) != 0:` ... please consider `if matches:`
John Machin
@Peter: the OP isn't "calling `re.match()` like that, rather than `re.compile()`", he is calling `p = re.compile()` inside his loop then `p.match()` -- this is ugly even when `re.compile()` is cached.
John Machin
+2  A: 

You might want to use p.match(line) instead. You're passing the first character of the line to the regex, not the whole line.

kprobst
+1  A: 

hi you don't need a regex to do this. here's one way you can check your header.

fh=open("file")
header=fh.readline().rstrip()
if not header.startswith("p wncf") :
    print "error"
header=header.split()
if len(header) != 5:
    print "error"
if False in map(str.isdigit, header[2:]):
    print "Error"
fh.close()