ansaurus

Question

Regular Expression to split on specific character ONLY if that character is not in a pair

Answer 1

+2 A:

You could use a negative lookbehind (assuming the regex engine in question supports it) to only match ampersands that do not follow another ampersand.

/(?<!&)&/

Dav on a Plane 2009-12-20 20:20:51

This worked perfectly. I don't know what kind of speed sacrifices I'm making by doing a lookbehind, so if anyone can come up with a more efficient solution (if it even exists), I'll be glad to hear it.

Mike Trpcic 2009-12-20 20:34:01

As noted above in comment to your question, my solution is actually faster than gnibbler's even before you change it to use a regex split. In any case, I included test code with correct simulated input that should easily let you benchmark the performance change if you stick with this approach.

Peter Hansen 2009-12-20 21:56:37

Answer 2

A:

Maybe loop while (q = str.find('&', p)) != -1, then append the left side (p + 2 to q - 1) and the replacement value.

jspcal 2009-12-20 20:21:05

Answer 3

A:

I think this does the trick:

import re

def fix(text):
    dict = {"y":"\033[0;30m",
            "c":"\033[0;31m",
            "b":"\033[0;32m",
            "Y":"\033[0;33m",
            "u":"\033[0;34m",
            "&":"&"}

    myparts = re.split('\&(\&*)', text)
    myparts[1:]=[dict.get(x[0],"&"+x[0])+x[1:] if len(x) > 0 else x for x in myparts[1:]]
    result = "".join(myparts)
    return result


print fix("The &yquick &cbrown &bfox &Yjumps over the &ulazy dog")
print fix("&yI &creally &blove A && W &uRootbeer.")

jbochi 2009-12-20 20:33:52

Answer 4

A:

re.sub will do what you want. It takes a regex pattern and can take a function to process the match and return the replacement. Below if the character following the & is not in the dictionary, no replacement is made. && is replaced with & to allow escaping an & that is followed by a character in the dictionary.

Also 'str' and 'dict' are bad variables names because they shadow the built-in functions of the same name.

In 's' below, '& cat' will not be affected and '&&cat' will become "&cat" suppressing &c translation.

import re

s = "The &yquick &cbrown &bfox & cat &&cat &Yjumps over the &ulazy dog"

D = {"y":"\033[0;30m",
     "c":"\033[0;31m",
     "b":"\033[0;32m",
     "Y":"\033[0;33m",
     "u":"\033[0;34m",
     "&":"&"}

def func(m):
    return D.get(m.group(1),m.group(0))

print repr(re.sub(r'&(.)',func,s))

OUTPUT:

'The \x1b[0;30mquick \x1b[0;31mbrown \x1b[0;32mfox & cat &cat \x1b[0;33mjumps over the \x1b[0;34mlazy dog'

-Mark

Mark Tolonen 2009-12-21 00:19:38

ansaurus

tags:

views:

answers:

Regular Expression to split on specific character ONLY if that character is not in a pair

related questions