ansaurus

Question

How to replace repeated instances of a character with a single instance of that character in python

Answer 1

+3 A:

I'd suggest using the re module sub function:

import re

result = re.sub("\*+", "*", "***abc**de*fg******h")

I highly recommend reading through the article about RE and good practices. They can be tricky if you're not familiar with them. In practice, using raw strings is a good idea.

JoshD 2010-10-07 04:00:16

Thank you very much, this works well, I'm going to read through the article on RE to figure out what exactly is going on with the "\+" part of the code. I didn't know you could use multiple symbols in conjunction with each other. I thought you could only use "+" or "*" for example.

NSchrading 2010-10-07 04:17:11

@NSchrading: In `"\\*+"`, I'm escaping the * character because it's a special re symbol. So I match a literal * character, and the + means one or more.

JoshD 2010-10-07 04:20:13

Answer 2

A:

re.sub('\*+', '*', pattern)

That will do.

Ruel 2010-10-07 04:08:01

Answer 3

A:

Well regular expressions wise I would do exactly as JoshD has suggested. But one improvement here.

Use -

regex  = re.compile('\*+')
result = re.sub(regex, "*", string)

This would essentially cache your regex. So subsequent usage of this in a loop would make your regex operations fast.

MovieYoda 2010-10-07 04:16:02

This is a premature optimization. Python caches recently-used compiled regexes anyway.

kindall 2010-10-07 04:54:04

Answer 4

A:

without regexp you can use general repeating element removal with checking of '*':

source = "***abc**dee*fg******h"
target = ''.join(c for c,n in zip(source, source[1:]+' ') if  c+n != '**')
print target

Tony Veijalainen 2010-10-07 04:24:52

Answer 5

A:

how about a non regex way

def squeeze(char,s):
    while char*2 in s:
        s=s.replace(char*2,char)
    return s
print squeeze("*" , "AB***abc**def**AA***k")

ghostdog74 2010-10-07 04:33:18

Answer 6

+3 A:

The naive way to do this kind of thing with re is

re.sub('\*+', '*', text)

That replaces runs of 1 or more asterisks with one asterisk. For runs of exactly one asterisk, that is running very hard just to stay still. Much better is to replace runs of TWO or more asterisks by a single asterisk:

re.sub('\*\*+', '*', text)

This can be well worth doing:

\python27\python -mtimeit -s"t='a*'*100;import re" "re.sub('\*+', '*', t)"
10000 loops, best of 3: 73.2 usec per loop

\python27\python -mtimeit -s"t='a*'*100;import re" "re.sub('\*\*+', '*', t)"
100000 loops, best of 3: 8.9 usec per loop

Note that re.sub will return a reference to the input string if it has found no matches, saving more wear and tear on your computer, instead of a whole new string.

John Machin 2010-10-07 04:42:55

Answer 7

A:

You wrote:

pattern.replace("*"\*, "*")

You meant:

pattern.replace("\**", "*")
#                ^^^^

You really meant:

pattern_after_substitution= re.sub(r"\*+", "*", pattern)

which does what you wanted.

ΤΖΩΤΖΙΟΥ 2010-10-07 19:21:07

ansaurus

tags:

views:

answers:

How to replace repeated instances of a character with a single instance of that character in python

related questions