Given your rules, I'd say you really want a simple state machine. Hmm, on second thought, maybe not; you can just look back in the string as you go.
I have a unicode string in Python and basically need to go through, character by character and replace certain ones based on a list of rules. One such rule is that a is changed to ö if a is after n. Also, if there are two vowel characters in a row, they get replaced by one vowel character and :. So if I have the string , what is the easiest and most efficient way of getting "nötaro:k"? Using Python 2.6 and CherryPy 3.1 if that matters.
vowel_set = frozenset(['a', 'e', 'i', 'o', 'u', 'ö'])
def fix_the_string(s):
lst = []
for i, ch in enumerate(s):
if ch == 'a' and lst and lst[-1] == 'n':
lst.append('ö')
else if ch in vowel_set and lst and lst[-1] in vowel_set:
lst[-1] = 'a' # "replaced by one vowel character", not sure what you want
lst.append(':')
else
lst.append(ch)
return "".join(lst)
print fix_the_string("natarook")
EDIT: Now that I saw the answer by @Anon. I think that's the simplest approach. This might actually be faster once you get a whole bunch of rules in play, as it makes one pass over the string; but maybe not, because the regexp stuff in Python is fast C code.
But simpler is better. Here is actual Python code for the regexp approach:
import re
pat_na = re.compile(r'na')
pat_double_vowel = re.compile(r'([aeiou])[aeiou]')
def fix_the_string(s):
s = re.sub(pat_na, r'nö', s)
s = re.sub(pat_double_vowel, r'\1:', s)
return s
print fix_the_string("natarook") # prints "nötaro:k"