views:

281

answers:

3

In Python: how do I say:

line = line.partition('#' or 'tab')[0]   ... do something with

I know I can do:

line = line.partition('#')[0]  ... do something

But what is the code for the tab character, and can I say # or tab?

Update: I'm trying to say read the first word on each line, if you read a # then ignore everything after that character (as it is a comment). But then I found if I had in the file first word tab #, then it would read the tab as part of the first word. So I was trying to say, if you read a tab or a hash, then treat the line as a comment. A work around is to just put a space after the first word rather than a tab. But it is not very elegant. I realize now that the if statement was incorrect, I was trying to simplify things too much. Above is now correct, but I think Ned Batchelder's way is the way to go now, but perhaps there is something else now that you know what I'm trying to do.

+8  A: 

partition doesn't allow for options, so you may need re.split:

re.split("(#|\t)", line, 1)

re.split has the interesting property that if the pattern is enclosed in parens, then the separator is returned in the results, and you can use maxsplit (as I have here set to 1). This will return a three-tuple similar to partition.

But you're testing the return value of partition, which is always True, so I'm not sure what you're trying to acheive...

Ned Batchelder
I wasn't aware that when surrounding the regex in brackets would return the split separator. Learn something every day. +1
MitMaro
This is pretty good. Behaviour is slightly different to partition if the sep is not found.
gnibbler
+2  A: 

'\t' is the string containing a tab.

import re

match = re.search('[#\t]', line)
if match:
    i, j = match.span()
    return (line[:i], line[i:j], line[j:])
return (line, '', '')

This will give results similar to partition: a tuple of (head, sep, tail).

ephemient
+1  A: 

Since the comment is from # to end-of-line, what we usually do is this.

raw_data, _, _ = line.partition("#")
data= raw_data.strip()
if len(data) == 0:
    continue # or whatever, the data part of the line is empty
# you have data

The point is to not try and combine the comment processing with the whitespace stripping.

[The raw_data, _, _ = line.partition("#") will save the part before "#" in raw_data, it will save the "#" in a variable named _. It will also save the part after the "#" in the variable named _. We're just going to ignore the variable named _, so we don't care what value it has.]

You can also do this

data, _, _ = line.strip().partition("#")

This isn't a general solution because sometimes that whitespace in front of the comment is meaningful.

S.Lott
`len(data) == 0` ?
SilentGhost
@SilentGhost: I had too much trouble explaining `if not data:` and the fact that a zero-length string was equivalent to `False`. It lead to too much of the wrong kind of thinking. There was that weird glint in people's eyes as they looked for other ways to finesse peculiar features from Python types. Sigh.
S.Lott
Thanks S.lott, but what does the raw_data, _, _ do?
John