views:

103

answers:

1

Hello,

I'm trying to write a function that repeatedly matches regexp patterns against an input string. The function should take pattern 1 match it against the input string and split it into parts of matching and non-matching segments. Pattern 2 would subsequently be used on those non-matching segments, until all input patterns are used. The return argument would then be an array of all the substrings.

Simple example:

input string "abcdefgh" against patterns "bc" and "f", would first split it into "a", "bc" and "defgh". Subsequently pattern "f" would be run against the "a" and "defgh" part and splitting the later into "de", "f", and "gh". Return argument {"a", "bc", "de", "f", "gh"}

(I would also keep an associative array with match/nonmatch information along with it)

But my questions are: What data structure would be most suitable to perform this kind of task? And how would this best be solved, It feels like something that would work in a recursive manner.

+2  A: 

A linked list comes to mind where every time you match a regex against a particular node you remove the node in question and insert 3 linked nodes in its place.

The particular "node" structure could be as simple as a struct with 3 fields, a char* for the string, a bool (char in c) for whether it's a match or not and the pointer to the next node.

Blindy
Thank you. I will try this.
I just saw your requirement that the data should be sorted (I'm assuming you mean it should appear in the order it was in the original string, like in your example) and this data structure does it already.
Blindy
Also it won't always be 3 nodes I would need to insert. A pattern could give more than 1 match inside the subject which would need more nodes, but that's not a big problem. I'm implementing your advice and hope it will work :)
Errr yea I knew that!
Blindy