ansaurus

Question

Python: use regular expression to remove the white space each line

Answer 1

+2 A:

you can try strip() if you want to remove front and back, or lstrip() if front

>>> s="  string with front spaces and back   "
>>> s.strip()
'string with front spaces and back'
>>> s.lstrip()
'string with front spaces and back   '

for line in open("file"):
    print line.lstrip()

If you really want to use regex

>>> import re
>>> re.sub("^\s+","",s) # remove the front
'string with front spaces and back   '
>>> re.sub("\s+\Z","",s)
'  string with front spaces and back'  #remove the back

ghostdog74 2010-10-21 05:38:40

Answer 2

+3 A:

Python's regex module does not default to multi-line ^ matching, so you need to specify that flag explicitly.

r = re.compile(r"^\s+", re.MULTILINE)
r.sub("", "a\n b\n c") # "a\nb\nc"

# or without compiling (only possible for Python 2.7+ because the flags option
# didn't exist in earlier versions of re.sub)

re.sub(r"^\s+", "", "a\n b\n c", flags = re.MULTILINE)

# but mind that \s includes newlines:
r.sub("", "a\n\n\n\n b\n c") # "a\nb\nc"

AndiDog 2010-10-21 05:45:22

Answer 3

A:

nowhite = ''.join(mytext.split())

NO whitespace will remain like you asked (everything is put as one word). More useful usualy is to join everything with ' ' or '\n' to keep words separately.

Tony Veijalainen 2010-10-21 06:20:30

Answer 4

A:

You'll have to use the re.MULTILINE option:

re.sub("(?m)^\s+", "", text)

The "(?m)" part enables multiline.

ΤΖΩΤΖΙΟΥ 2010-10-21 12:24:08

Answer 5

A:

@AndiDog acknowledges in his (currently accepted) answer that it munches consecutive newlines.

Here's how to fix that deficiency, which is caused by the fact that \n is BOTH whitespace and a line separator. What we need to do is make an re class that includes only whitespace characters other than newline.

We want whitespace and not newline, which can't be expressed directly in an re class. Let's rewrite that as not not (whitespace and not newline) i.e. not(not whitespace or not not newline (thanks, Augustus) i.e. not(not whitespace or newline) i.e. [^\S\n] in re notation.

So:

>>> re.sub(r"(?m)^[^\S\n]+", "", "  a\n\n   \n\n b\n c\nd  e")
'a\n\n\n\nb\nc\nd  e'

John Machin 2010-10-21 23:45:59

ansaurus

tags:

views:

answers:

Python: use regular expression to remove the white space each line

related questions