There's one tricky part to this: The ZZ being both the start token and the end token.
There's one start case (ZZ, not followed by another ZZ which would signify that the first ZZ was actually an end token), and two end cases (ZZ end of string, ZZ followed by ZZ). The goal is to match the start case and NOT either of the end cases.
To that end, here's what I suggest:
/ZZ(?!ZZ)(.*?)(ZZ(?!(ZZ|$))|$)/
For string ZZfooZZZZbarZZbazZZ
:
- This will NOT match ZZfooZZ, a legitimate record: ZZ, not followed by ZZ, followed by any combination of characters (here "foo"), followed by ZZ, but that ZZ is followed by ZZ, which opens the next record.
- The next part examined is the ZZ after foo. This fails because the ZZ cannot be followed by another ZZ, yet in this case it is. This is as we want because the ZZ right after foo does not start a new record anyway.
- The ZZ right before bar is not followed by another ZZ, so it's a legitimate start of record. "bar" is consumed by the .*?. Then there is a ZZ, but it is NOT followed by another ZZ or the end of string, which means that the ZZbar token is no good.
- (It COULD be interpreted by a human as ZZbarZZ with bazZZ not being valid, but in either case there's something wrong, so I just wrote the regex to consider the wrongly-formatted record to occur here)
- So ZZbar will be caught/matched by the regex, as illegitimate.
- The ZZ after the bar isn't followed by ZZ, is followed by baz, followed by a ZZ that fails the lookahead assertion stating it can't be followed by the end of the string. So ZZbazZZ is a legitimate record and is not captured in the regex.
One more case: For ZZfoo
, the beginning ZZ is okay, the foo is captured, then the regex notes that it's the end of the string, and no ZZ has occurred. Thus, ZZfoo is captured as an illegitimate match.
Let me know if this doesn't make sense, so I can make it more clear.