views:

47

answers:

2

I have a regex expression that traverses a string and pulls out 40 values, it looks sort if like the query below, but much larger and more complicated

est(.*)/test>test>(.*)<test><test>(.*)test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test>

My question is how do i use these expressions with the replace command when the number exceeds 9. It seems as if whenever i use \10 it returns the value for \1 and then appends a 0 to the end. Any help would be much appreciated thanks :)

Also i am using UltraEdit studio, but if a different program does it better then no biggie :)

+2  A: 

Most of the simple Regex engines used by editors aren't equipped to handle more than 10 matching groups; it doesn't seem like UltraEdit can. I just tried Notepad++ and it won't even match a regex with 10 groups.

Your best bet, I think, is to write something fast in a quick language with a decent regex parser. Here's something in Python:

import re

pattern = re.compile('(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)')
with open('input.txt', 'r') as f:
    for line in f:
        m = pattern.match(line)
        print m.groups()

Note that Python allows backreferences such as \20: in order to have a backreference to group 2 followed by a literal 0, you need to use \g<2>0, which is unambiguous.

Chris B.
"Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems." - Jamie Zawinskithe above quote never seemed so true :(thanks for the help :)
Dustin
+1  A: 

If you cannot handle more than 9 subgroups why not initially match groups of 9 and then loop and apply regexes to those matches?

i.e. first match (<test.*/test>)+ and then for each subgroup match on <test(.*)/test>.

Zugwalt
Unfortunately in my case that would not work, the pattern is pretty big. However I appreciate the suggestion. Thanks :)
Dustin