If I have a string
"this is a string"
How can I shorten it so that I only have one space between the words rather than multiple? (The number of white spaces is random)
"this is a string"
If I have a string
"this is a string"
How can I shorten it so that I only have one space between the words rather than multiple? (The number of white spaces is random)
"this is a string"
re.sub(r'\s+', ' ', 'this is a string')
You can pre-compile and store this for potentially better performance:
MULT_SPACES = re.compile(r'\s+')
MULT_SPACES.sub(' ', 'this is a string')
You could use string.split
and " ".join(list)
to make this happen in a reasonably pythonic way - there are probably more efficient algorithms but they won't look as nice.
Incidentally, this is a lot faster than using a regex, at least on the sample string:
import re
import timeit
s = "this is a string"
def do_regex():
for x in xrange(100000):
a = re.sub(r'\s+', ' ', s)
def do_join():
for x in xrange(100000):
a = " ".join(s.split())
if __name__ == '__main__':
t1 = timeit.Timer(do_regex).timeit(number=5)
print "Regex: ", t1
t2 = timeit.Timer(do_join).timeit(number=5)
print "Join: ", t2
$ python revsjoin.py
Regex: 2.70868492126
Join: 0.333452224731
Compiling this regex does improve performance, but only if you do call sub
on the compiled regex, instead of passing the compiled form into re.sub
as an argument:
def do_regex_compile():
pattern = re.compile(r'\s+')
for x in xrange(100000):
# Don't do this
# a = re.sub(pattern, ' ', s)
a = pattern.sub(' ', s)
$ python revsjoin.py
Regex: 2.72924399376
Compiled Regex: 1.5852200985
Join: 0.33763718605
Try this:
s = "this is a string"
tokens = s.split()
neat_s = " ".join(tokens)
The string's split function will return a list of non empty tokens split by whitespace. So if you try
"this is a string".split()
you will get back
['this', 'is', 'a', 'string']
The string's join function will join a list of tokens together using the string itself as a delimiter. In this case we want a space, so
" ".join("this is a string".split())
Will split on occurrences of a space, discard the empties, then join again, separating by spaces. For more about string operations, check out Python's common string function documentation.
EDIT: I misunderstood what happens when you pass a delimiter to the split function. See markuz's answer for this.
Pretty the same answer by Ben Gartner, but, this adds the "if this is not an empty string" check.
>>> a = 'this is a string'
>>> ' '.join([k for k in a.split(" ") if k])
'this is a string'
>>>
if you don't check for empty strings you'll get this:
>>> ' '.join([k for k in a.split(" ")])
'this is a string'
>>>