ansaurus

Question

Answer 1

+5 A:

You need to exclude the line breaks at the end of the separating lines. Try this:

\n(?<!-\n)(?!-)

This regular expression uses a negative look-behind assertion to exclude \n that’s preceeded by an -.

Gumbo 2009-09-14 18:55:20

Thanks, I see now. I failed to define the problem thoroughly before attempting a solution, then confused things further by presuming I was replacing all \n's when actually replacing only half.

fwkb 2009-09-14 19:33:25

Answer 2

+1 A:

re.sub(r'(?<!-)\n(?!-)', ' ', text)

(Hyphen doesn't need escaping outside of a character class.)

chaos 2009-09-14 19:03:04

… and outside of a character range declaration and at the start or end of a claracter class. `[a-z-0-9]`, `[-a-z]` and `[a-z-]` are all valid character class declarations.

Gumbo 2009-09-14 19:41:48

Answer 3

+7 A:

This is a good place to use a generator function to skip the lines of ----'s and yield something that the csv module can read.

def readCleanLines( someFile ):
    for line in someFile:
        if line.strip() == len(line.strip())*'-':
            continue
        yield line

reader= csv.reader( readCleanLines( someFile ) )
for row in reader:
    print row

This should handle the line breaks inside quotes seamlessly and silently.

If you want to do other things with this file, for example, save a copy with the ---- lines removed, you can do this.

with open( "source", "r" ) as someFile:
    with open( "destination", "w" ) as anotherFile:
        for line in readCleanLines( someFile ):
            anotherFile.write( line )

That will make a copy with the ---- lines removed. This isn't really worth the effort, since reading and skipping the lines is very, very fast and doesn't require any additional storage.

S.Lott 2009-09-14 19:08:25

awesome idea to strip lines with a generator!

orip 2009-09-14 19:33:16

BTW - don't you need len(line.strip()) instead of len(line)?

orip 2009-09-14 19:34:13

@orip: That would be a bug, thank you.

S.Lott 2009-09-14 20:05:00

@S.Lott: Comment using the non-word "resave" deleted. Use case added.

fwkb 2009-09-14 20:44:17

Thanks! I will definitely put that to use!

fwkb 2009-09-14 20:48:51

@fwkb: Stack Overflow maintains it's own change history, saving you from having to track changes via extra comments. You can simply make changes and not worry about leaving some kind of audit trail. It's already tracked.

S.Lott 2009-09-15 00:10:30

Answer 4

A:

A RegEx isn't always the best tool for the job. How about running it through something like "Split" or "Tokenize" first? (I'm sure python has an equivalent) Then you have your records and can assume newlines are just continuations.

Eric Nicholson 2009-09-14 19:29:07

ansaurus

tags:

views:

answers:

Negative lookahead after newline?

related questions