views:

229

answers:

8

Suppose I have this:

My---sun--is------very-big---.

I want to replace all multiple hyphens with just one hyphen.

+10  A: 
import re

astr='My---sun--is------very-big---.'

print(re.sub('-+','-',astr))
# My-sun-is-very-big-.
unutbu
+1, but `-{2,}` would avoid replacing single `-` unnecessarily.
Tim Pietzcker
+1  A: 
re.sub('-+', '-', "My---sun--is------very-big---")
Jakub Hampl
+2  A: 

How about:

>>> import re
>>> re.sub("-+", "-", "My---sun--is------very-big---.")
'My-sun-is-very-big-.'

the regular expression "-+" will look for 1 or more "-".

Charles Beattie
+5  A: 

If you really only want to coalesce hyphens, use the other suggestions. Otherwise you can write your own function, something like this:

>>> def coalesce(x):
...     n = []
...     for c in x:
...         if not n or c != n[-1]:
...             n.append(c)
...     return ''.join(n)
...
>>> coalesce('My---sun--is------very-big---.')
'My-sun-is-very-big-.'
>>> coalesce('aaabbbccc')
'abc'
FogleBird
+1 for a general solution. Since the OP used English words in their example, specifying a set of characters to coalesce (or not coalesce) would probably be preferable so as to avoid mangling words with double letters (i.e. letters -> leters).
tgray
Agreed on the +1 for a general solution
mcpeterson
+5  A: 

As usual, there's a nice itertools solution, using groupby:

>>> from itertools import groupby
>>> s = 'aaaaa----bbb-----cccc----d-d-d'
>>> ''.join(key for key, group in groupby(s))
'a-b-c-d-d-d'
Will McCutchen
This solution doesn't answer the question if you only want to dedupe the hyphens. Is there an itertools solution that would keep 'aaaaa'?
mcpeterson
@McPeterson: Sure, but they're not as nice. For just handling hyphens, you can do `''.join(key if key == '-' else ''.join(group) for key, group in groupby(s))`. For handling any non-alphanumeric character, `''.join(''.join(group) if key.isalnum() else key for key, group in groupby(s))`. But I'd just use one of the regex solutions instead.
Will McCutchen
+11  A: 

If you want to replace any run of consecutive characters, you can use

>>> import re
>>> a = "AA---BC++++DDDD-EE$$$$FF"
>>> print(re.sub(r"(.)\1+",r"\1",a))
A-BC+D-E$F

If you only want to coalesce non-word-characters, use

>>> print(re.sub(r"(\W)\1+",r"\1",a))
AA-BC+DDDD-EE$FF

If it's really just hyphens, I recommend unutbu's solution.

Tim Pietzcker
+1  A: 

How about an alternate without the re module:

'-'.join(filter(lambda w: len(w) > 0, 'My---sun--is------very-big---.'.split("-")))

Or going with Tim and FogleBird's previous suggestion, here's a more general method:

def coalesce_factory(x):
    return lambda sent: x.join(filter(lambda w: len(w) > 0, sent.split(x)))

hyphen_coalesce = coalesce_factory("-")
hyphen_coalesce('My---sun--is------very-big---.')

Though personally, I would use the re module first :)

  • mcpeterson
mcpeterson
A: 

Another simple solution is the String object's replace function.

while '--' in astr:
    astr = astr.replace('--','-')