views:

128

answers:

1

I'm trying to turn free-form text into something more structured. I have a complex pattern that matches the great majority (well above the minimum acceptable limit) of the data available, and I'd like to use that to assist in structuring the data, rather than parsing the text character-by-character. The problem that I've just run into is that Oracle does not seem to have any way of handling capture groups (unless I somehow missed it?).

For example,my expression has quite a few named capture groups such as ((?<runit_ID>\d+)-) and (STAT_N|STTN|STAT|STN) ?(?<STAT>\w+). The codebase is written entirely in PL/SQL so I can't use C# or something else to refer to the capture groups by name. How do people work around this in PL/SQL?

+1  A: 

Assuming you are using Oracle 10g or higher, you can use REGEXP_REPLACE with backreferences.

See examples in the Oracle docs for REGEXP_REPLACE and in this article at regular-expressions.info.

shoover
Ok, so now I've got something that looks like:`begin stat := REGEXP_replace('123 blah blah, STAT A, blah', '.+(STAT_N|STTN|STAT|STN) ?(\\w+).+','\\2',1,1,'i'); num := REGEXP_replace('123 blah blah, STAT A, blah', '(\\d+) .+','\\1',1,1,'i'); dbms_output.put_line(stat); dbms_output.put_line(num);end;`Previously, num and stat were named capture groups. I guess I'll have to follow this pattern for all the other groups. Or, just write a function that re-writes named capture groups into something like this. If I get some code for that, I'll post it.
FrustratedWithFormsDesigner