tags:

views:

93

answers:

7

I want to replace repeated instances of the "*" character within a string with a single instance of "*". For example if the string is "***abc**de*fg******h", I want it to get converted to "*abc*de*fg*h".

I'm pretty new to python (and programming in general) and tried to use regular expressions and string.replace() like:

import re    
pattern = "***abc**de*fg******h"
pattern.replace("*"\*, "*")

where \* is supposed to replace all instances of the "*" character. But I got: SyntaxError: unexpected character after line continuation character.

I also tried to manipulate it with a for loop like:

def convertString(pattern):
for i in range(len(pattern)-1):
    if(pattern[i] == pattern[i+1]):
        pattern2 = pattern[i]
return pattern2

but this has the error where it only prints "*" because pattern2 = pattern[i] constantly redefines what pattern2 is...

Any help would be appreciated.

+3  A: 

I'd suggest using the re module sub function:

import re

result = re.sub("\*+", "*", "***abc**de*fg******h")

I highly recommend reading through the article about RE and good practices. They can be tricky if you're not familiar with them. In practice, using raw strings is a good idea.

JoshD
Thank you very much, this works well, I'm going to read through the article on RE to figure out what exactly is going on with the "\+" part of the code. I didn't know you could use multiple symbols in conjunction with each other. I thought you could only use "+" or "*" for example.
NSchrading
@NSchrading: In `"\\*+"`, I'm escaping the * character because it's a special re symbol. So I match a literal * character, and the + means one or more.
JoshD
A: 
re.sub('\*+', '*', pattern)

That will do.

Ruel
A: 

Well regular expressions wise I would do exactly as JoshD has suggested. But one improvement here.

Use -

regex  = re.compile('\*+')
result = re.sub(regex, "*", string)

This would essentially cache your regex. So subsequent usage of this in a loop would make your regex operations fast.

MovieYoda
This is a premature optimization. Python caches recently-used compiled regexes anyway.
kindall
A: 

without regexp you can use general repeating element removal with checking of '*':

source = "***abc**dee*fg******h"
target = ''.join(c for c,n in zip(source, source[1:]+' ') if  c+n != '**')
print target
Tony Veijalainen
A: 

how about a non regex way

def squeeze(char,s):
    while char*2 in s:
        s=s.replace(char*2,char)
    return s
print squeeze("*" , "AB***abc**def**AA***k")
ghostdog74
+3  A: 

The naive way to do this kind of thing with re is

re.sub('\*+', '*', text)

That replaces runs of 1 or more asterisks with one asterisk. For runs of exactly one asterisk, that is running very hard just to stay still. Much better is to replace runs of TWO or more asterisks by a single asterisk:

re.sub('\*\*+', '*', text)

This can be well worth doing:

\python27\python -mtimeit -s"t='a*'*100;import re" "re.sub('\*+', '*', t)"
10000 loops, best of 3: 73.2 usec per loop

\python27\python -mtimeit -s"t='a*'*100;import re" "re.sub('\*\*+', '*', t)"
100000 loops, best of 3: 8.9 usec per loop

Note that re.sub will return a reference to the input string if it has found no matches, saving more wear and tear on your computer, instead of a whole new string.

John Machin
A: 

You wrote:

pattern.replace("*"\*, "*")

You meant:

pattern.replace("\**", "*")
#                ^^^^

You really meant:

pattern_after_substitution= re.sub(r"\*+", "*", pattern)

which does what you wanted.

ΤΖΩΤΖΙΟΥ