I have some unicode text that I want to clean up using regular expressions. For example I have cases where u'(2'. This exists because for formatting reasons the closing paren ends up in an adjacent html cell. My initial solution to this problem was to look ahead at the contents of the next cell and using a string function determine if it held the closing paren. I knew this was not a great solution but it worked. Now I want to fix it but I can't seem to make the regular expression work.
missingParen=re.compile(r"^\(\d[^\)]$")
My understanding of what I think I am doing:
^ at the beginning of the string I want to find
( an open paren, the paren has to be backslashed because it is a special character
\d I also want to find a single digit
[ I am creating a special character class
^ I don't want to find what follows
) which is a close paren
$ at the end of the string
And of course the plot thickens I made a silly assumption that because I placed a \d I would not find (33 but I am wrong so I added a {1} to my regular expression and that did not help, it matched (3333, so my problem is more complicated than I thought. I want the string to be only an open paren and a single digit. Is this the more clever approach
missingParen=re.compile(r"^\(\d$")
And note S Lott _I already tagged it beginner so you can't pick up any cheap points Not that I don't appreciate your insights I keep meaning to read your book, it probably has the answer