views:

375

answers:

3

Is there a cross-platform library function that would collapse a multiline string into a single-line string with no repeating spaces?

I've come up with some snip below, but I wonder if there is a standard function which I could just import which is perhaps even optimized in C?

def collapse(input):
    import re
    rn = re.compile(r'(\r\n)+')
    r = re.compile(r'\r+')
    n = re.compile(r'\n+')
    s = re.compile(r'\ +')
    return s.sub(' ',n.sub(' ',r.sub(' ',rn.sub(' ',input))))

P.S. Thanks for good observations. ' '.join(input.split()) seems to be the winner as it actually runs faster about twice in my case compared to search-replace with a precompiled r'\s+' regex.

A: 
multi_line.replace('\n', '')

will do the job. '\n' is a universal end of line character in python.

SilentGhost
thanks. good to know. but it won't work well since it will not insert empty space where needed and will not remove repeating empty spaces
Evgeny
I'm not sure what @Evgeny means by "empty space" but in any case the proposed solution doesn't address the OP's "no repeating spaces" requirement.
John Machin
+6  A: 

The built-in string.split() method will split on runs of whitespace, so you can use that and then join the resulting list using spaces, like this:

' '.join(my_string.split())

Here's a complete test script:

TEST = """This
is        a test\twith a
  mix of\ttabs,     newlines and repeating
whitespace"""

print ' '.join(TEST.split())
# Prints:
# This is a test with a mix of tabs, newlines and repeating whitespace
RichieHindle
+3  A: 

You had the right idea, you just needed to read the python manual a little more closely:

import re
somewhitespace = re.compile(r'\s+')
TEST = """This
is        a test\twith a
  mix of\ttabs,     newlines and repeating
whitespace"""

somewhitespace.sub(' ', TEST)

'This is a test with a mix of tabs, newlines and repeating whitespace'
Unknown