ansaurus

Question

Answer 1

+4 A:

something like that (lines is a list of all the lines in order):

import re
if re.match(r'p wncf \d+ \d+ \d+', lines[0]) == None:
    print "Bad format"

RC 2009-12-14 21:54:31

With a warning that this re assumes that all of the fields are seperated by exactly one space...

mkClark 2009-12-14 22:02:52

@mkClark, that was an assumption, thanks for the precision :)

RC 2009-12-14 22:08:53

Although it's safe here, comparing for equality with None is not a good idea in general. One should always use "is None" or "is not None" (using identity comparison) instead. (It's also faster.)

Peter Hansen 2009-12-14 22:59:23

@Peter, I wasn't aware of the "is None" comparison, I will look into that in the doc, thx

RC 2009-12-15 06:07:12

Answer 2

+1 A:

p, wncf, nvar, nclauses, hard = line.split()
nvar = int(nvar)
nclauses = int(nclauses)
hard = int(hard)

jcdyer 2009-12-14 22:02:09

How would this be implemented? because i'm assuming the numbers in the header would be assigned to its corresponding object?

harpalss 2009-12-14 22:22:21

This code doesn't protect against the integers being negative, it doesn't catch the exceptions that will get thrown if the format doesn't match and neither does it protect against there being extraneous information at the end of the header line. In short this code fragment only works if the header is well formed.

Andrew O'Reilly 2009-12-14 22:30:59

True. It should check that numbers are negative, but I don't think it should protect against exceptions. I would think it's the responsibility of this code to provide exceptions. Maybe it should catch the type errors raised if the values aren't ints and reraise a ValueError. If nvar, nclauses, and hard are not positive ints, I think raising a ValueError is the right thing to do. If the OP wants something else, he can catch it in the receiving code.

jcdyer 2009-12-15 01:27:53

Answer 3

A:

Using regular expressions would be about the easiest way to check this header:-

import re
p = re.compile('p wncf \d+ \d+ \d+$')
if p.match(lineToBeChecked) == None:
  print "Header does not have correct format"

Note the use of the trailing $ in the regex to anchor the regex to the end of the line and so protect against additional information being included on the header line (which I've assumed would make it invalid).

If arbitrary numbers of spaces are allowed between parameters the regular expression could be changed to this:-

p = re.compile('p[ ]+wncf[ ]+\d+[ ]+\d+[ ]+\d+$')

Andrew O'Reilly 2009-12-14 22:10:32

Here my script so far, but im still getting an incorrect format when the file format is correct. You were right in assuming one space between the parameters.import refilename = raw_input('Please enter the name of the WNCF file: ')f = open(filename, 'r')for line in f: p = re.compile('p wncf \d+ \d+ \d+$') if p.match(line[0]) == None: print "incorrect format"Also would it be possible to assign objects to the integers?thanks!

harpalss 2009-12-14 22:44:41

sorry i assumed the coding would print out as the same format i typed it in.

harpalss 2009-12-14 22:46:34

With this code you are only passing the first character of line to the regular expression matcher, which will obviously always fail. Change 'if p.match(line[0]) == None:' to 'if p.match(line) == None:'

Andrew O'Reilly 2009-12-15 01:25:06

regex are not always the easiest.

2009-12-15 02:10:23

Answer 4

+6 A:

Alright, a few things.

You only need to compile your regular expression once. In the example you gave above, you're recompiling it for every line in the file.
line[0] is just the first character in each line. Replace line[0] with line and your code should work.

To assign the integers to an object, you have to surround the groups you want in parentheses. In your case, let

p = re.compile(r"p wncf (\d+) (\d+) (\d+)")

And instead of p.match(line), which returns a match object or None, you could use findall. Check out the following as a replacement for what you have.

p = re.compile(r"p wncf (\d+) (\d+) (\d+)") 
for line in f: 
    matches = p.findall(line)
    if len(matches) != 0:
        print matches[0][0], matches[0][1], matches[0][2]
    else:
        print "No matches."

Edit: If your header values can contain negative numbers as well, you should replace r"p wncf (\d+) (\d+) (\d+)" with r"p wncf (-?\d+) (-?\d+) (-?\d+)".

Dan Loewenherz 2009-12-14 23:18:52

hey thanks for the help, i typed the code in exactly as you did and also made the changes you recommended but i still get the 'no matches' print?

harpalss 2009-12-14 23:33:56

That's weird. It works for me. Can you post the first few lines of the file you're reading?

Dan Loewenherz 2009-12-14 23:54:38

`p wncf 1569 817439 186191`thats all thats in the file for the moment.

harpalss 2009-12-15 00:17:43

Which Python version are you using? If you don't know, run `python -V` at the command line.

Dan Loewenherz 2009-12-15 01:02:43

er.........its 2.6

harpalss 2009-12-15 01:11:05

@Dan, note that the re compilation results (first 100 or so) are actually cached, so for most programs it's not really a problem to just call re.match() like that, rather than re.compile().

Peter Hansen 2009-12-15 01:15:19

@Peter, that's pretty cool, didn't realize that. Thanks for the heads up!

Dan Loewenherz 2009-12-15 01:42:45

@Dan: `if len(matches) != 0:` ... please consider `if matches:`

John Machin 2009-12-15 06:49:29

@Peter: the OP isn't "calling `re.match()` like that, rather than `re.compile()`", he is calling `p = re.compile()` inside his loop then `p.match()` -- this is ugly even when `re.compile()` is cached.

John Machin 2009-12-15 06:54:59

Answer 5

+2 A:

You might want to use p.match(line) instead. You're passing the first character of the line to the regex, not the whole line.

kprobst 2009-12-14 23:21:00

Answer 6

+1 A:

hi you don't need a regex to do this. here's one way you can check your header.

fh=open("file")
header=fh.readline().rstrip()
if not header.startswith("p wncf") :
    print "error"
header=header.split()
if len(header) != 5:
    print "error"
if False in map(str.isdigit, header[2:]):
    print "Error"
fh.close()

2009-12-15 00:56:13

ansaurus

tags:

views:

answers:

Python: Checking Header Format

related questions