ansaurus

Question

Answer 1

+3 A:

Here's the correct regex to do something like this:

([^#]*)(#.*)?

Also, why don't you just use

file = open('file.txt')
for line in file:

Can Berk Güder 2009-05-31 20:27:35

As I understand it, the OP doesn't want to match the comment at all, so you can drop the second part of your regex: (#.*)?

Alan Moore 2009-05-31 22:34:59

Answer 2

+3 A:

The * is greedy (consumes as much of the string as it can) and is thus consuming the entire line (past the # and to the end-of-line). Change ".*" to ".*?" and it will work.

See the Regular Expression HOWTO for more information.

Benji York 2009-05-31 20:36:46

I went through the documentation for the RE module, but didn't quite understand the "greedy" explanation as goo as you pointed out. Thanks for a great answer :)

alfredodeza 2009-05-31 21:23:45

Answer 3

A:

Use this regular expression:

^(.*?)(?:#|$)

With the non-greedy modifier (?), the .* expression will match as soon as either a hash sign or end-of-line is reached. The default is to match as much as possible, and that is why you always got the whole line.

ΤΖΩΤΖΙΟΥ 2009-05-31 20:56:14

Answer 4

+1 A:

@Can, @Benji and @ ΤΖΩΤΖΙΟΥ give three excellent solutions, and it's fun to time them to see how fast they match (that's what timeit is for -- fun meaningless micro-benchmarks;-). E.g.:

$ python -mtimeit -s'import re; r=re.compile(r"([^#]*)(#.*)?"); s="this is a line   # and this is a comment"' 'm=r.match(s); g=m.group(1)'
100000 loops, best of 3: 2.02 usec per loop

vs

$ python -mtimeit -s'import re; r=re.compile(r"^(.*?)(?:#|$)"); s="this is a line   # and this is a comment"' 'm=r.match(s); g=m.group(1)'
100000 loops, best of 3: 4.19 usec per loop

vs

$ python -mtimeit -s'import re; r=re.compile(r"(.*?)(#|$)"); s="this is a line   # and this is a comment"' 'm=r.match(s); g=m.group(1)'
100000 loops, best of 3: 4.37 usec per loop

and the winner is... a mix of the patterns!-)

$ python -mtimeit -s'import re; r=re.compile(r"(.*?)(#.*)?"); s="this is a line   # and this is a comment"' 'm=r.match(s); g=m.group(1)'
1000000 loops, best of 3: 1.73 usec per loop

Disclaimer: of course if this were a real benchmarking exercise and speed did truly matter, one would try on many different and relevant values for s, on tests beyond such a microbenchmark, etc, etc. But, I still find timeit an inexhaustible source of fun!-)

Alex Martelli 2009-05-31 21:00:17

ansaurus

tags:

views:

answers:

Expression up to comment or end of line

related questions