views:

97

answers:

3

It's been years (and years) since I've done any regex, so turning to experts on here since it's likely a trivial exercise :)

I have a tab delimited file and on each line I have a certain fields that have values such as:

  • foo
  • bar
  • b"foo's bar"
  • b'bar foo'
  • b'carbar'

(A complete line in the file might be something like:

123\t b'bar foo' \tabc\t123\r\n

I want to get rid of all the leading b', b" and trailing ", ' from that field on every line. So given the example line above, after running the regex, I'd get:

123\t bar foo \tabc\t123\r\n

Bonus points if you can give me the python blurb to run this over the file.

+1  A: 

(^|\t)b[\"'] should match the leadings, and for the trailing:

\"' should do it

In Python, you do:

import re
r1 = re.compile("(^|\t)b[\"']")
r2 = re.compile("[\"'](\t|$)")

then just use

r1.sub("\\1", yourString)
r2.sub("\\1", yourString)
Aaron
+1  A: 

for each line you can use

re.sub(r'''(?<![^\t\n])\W*b(["'])(.*)\1\W*(?![^\t\n])''', r'\2', line)

and for bonus points:

import re

pattern = re.compile(r'''(?<![^\t\n])\W*b(["'])(.*?)\1\W*?(?![^\t\n])''')
with open('outfile', 'w') as outfile:
    for line in open('infile'):
        outfile.write(pattern.sub(r'\2', line))
cobbal
A: 
>>> "b\"foo's bar\"".replace('b"',"").replace("b'","").rstrip("\"'")
"foo's bar"
>>> "b'bar foo'".replace('b"',"").replace("b'","").rstrip("\"'")
'bar foo'
>>>
ghostdog74