views:

155

answers:

4

Hi Guys,

How does one check the order of lines in a file?

Example file:

a b c d e f
b c d e f g
1 2 3 4 5 0

Requirements:

  1. All lines beginning a, must precede lines beginning b.
  2. There is no limit on number of lines beginning a.
  3. Lines beginning a, may or may not be present.
  4. Lines containing integers, must follow lines beginning b.
  5. Numeric lines must have at least two integers followed by zero.
  6. Failure to meet conditions must raise error.

I initially thought a rather long-winded for loop, but that failed as I am unable to index lines beyond line[0]. Also, I do not know how to define location of one line relative to the others. There is no limit on the length of these files so memory may also be an issue.

Any suggestions very welcome! Simple and readable is welcome for this confused novice!

Thanks, Seafoid.

+1  A: 

You can get all the lines into a list with lines = open(thefile).readlines() and then work on the list -- not maximally efficient but maximally simple, as you require.

Again simplest is to do multiple loops, one per condition (except 2, which is not a condition that can be violated, and 5 which isn't really a condition;-). "All lines beginning a, must precede lines beginning b" might be thought of as "the last line beginning with a, if any, must be before the first line beginning with b", so:

lastwitha = max((i for i, line in enumerate(lines)
                 if line.startswith('a')), -1)
firstwithb = next((i for i, line in enumerate(lines) 
                   if line.startswith('b')), len(lines))
if lastwitha > firstwithb: raise Error

then similarly for "lines containing integers":

firstwithint = next((i for i, line in enumerate(lines)
                     if any(c in line for c in '0123456789')), len(lines))
if firstwithint < firstwithb: raise Error

This shouild really be plenty of hints for your homework -- can you now do by yourself the last remaining bit, condition 4?

Of course you can take different tacks from what I'm suggesting here (using next to get the first number of a line satisfying a condition -- this requires Python 2.6, btw -- and any and all to satisfy if any / all items in a sequence meets a condition) but I'm trying to match your request for maximum simplicity. If you find traditional for loops simpler than next, any and all, let us know and we'll show how to recode these uses of the higher abstraction forms into those lower-layer concepts!

Alex Martelli
@Alex, I understand this. Lucid, clear and simple! However, what does 'Generator expression must be parenthesized if not sole argument' mean as this is raised when i try to implement code. I will try and sort out the condition you left for me and post it later.
Seafoid
Also, ""All lines beginning a, must precede lines beginning b" might be thought of as "the last line beginning with a, if any, must be before the first line beginning with b"" The way programmers view problems is brilliant!
Seafoid
@seafoid, yep, looks like I omitted parentheses -- editing to fix.
Alex Martelli
+3  A: 

A straightforward iterative method. This defines a function to determine a linetype from 1 to 3. Then we iterate over the lines in the file. An unknown line type or a linetype less than any previous one will raise an exception.

def linetype(line):
    if line.startswith("a"):
        return 1
    if line.startswith("b"):
        return 2
    try:
        parts = [int(x) for x in line.split()]
        if len(parts) >=3 and parts[-1] == 0:
            return 3
    except:
        pass
    raise Exception("Unknown Line Type")

maxtype = 0

for line in open("filename","r"):  #iterate over each line in the file
    line = line.strip() # strip any whitespace
    if line == "":      # if we're left with a blank line
        continue        # continue to the next iteration

    lt = linetype(line) # get the line type of the line
                        # or raise an exception if unknown type
    if lt >= maxtype:   # as long as our type is increasing
        maxtype = lt    # note the current type
    else:               # otherwise line type decreased
        raise Exception("Out of Order")  # so raise exception

print "Validates"  # if we made it here, we validated
Mark Peters
@Mark - I follow the portion where you define the function, however, the for loop escapes me. Would you mind to annotate it a little?
Seafoid
That was quick Mark - thanks!
Seafoid
A: 

You don't need to index the lines. For every line you can chceck/set some conditions. If some condition is not met, raise an error. E.g. rule 1: you will have variable was_b initially set to False. In each iteration (besides from other checks / sets), check also, if the line starts with "b". If does, set was_b = True. Another check would be: if line starts with "a" and was_b is true, raise the error. Another check would be: if line contains integers and was_b is False, raise the error.. etc

mykhal
A: 

Restrictions on lines:

I. There must be no lines that begin with 'a' after we've encountered a line that begins with 'b'.

II. If we encountered a numeric line then a previous one must start with 'b'. (or your 4-th condition allows another interpretation: each 'b' line must be followed by a numeric line).

Restriction on numeric line (as a regular expression): /\d+\s+\d+\s+0\s*$/

#!/usr/bin/env python
import re

is_numeric = lambda line: re.match(r'^\s*\d+(?:\s|\d)*$', line)
valid_numeric = lambda line: re.search(r'(?:\d+\s+){2}0\s*$', line)

def error(msg):
    raise SyntaxError('%s at %s:%s: "%s"' % (msg, filename, i+1, line))

seen_b, last_is_b = False, False
with open(filename) as f:
    for i, line in enumerate(f):
        if not seen_b:
           seen_b = line.startswith('b')

        if seen_b and line.startswith('a'):
           error('failed I.')
        if not last_is_b and is_numeric(line):
           error('failed II.')
        if is_numeric(line) and not valid_numeric(line):
           error('not a valid numeric line')

        last_is_b = line.startswith('b')
J.F. Sebastian