On my OS X 10.5.8 machine, using the regcomp and regexec C functions to match the extended regex "(()|abc)xyz", I find a match for the string "abcxyz" but only from offset 3 to offset 6. My expectation was that the entire string would be matched and that I would see a submatch for the initial "abc" part of the string.
When I try the same pattern and text with awk on the same machine, it shows a match for the entire string as I would expect.
I expect that my limited experience with regular expressions may be the problem. Can somebody explain what is going on? Is my regular expression valid? If so, why doesn't it match the entire string?
I understand that "((abc){0,1})xyz" could be used as an alternative, but the pattern of interest is being automatically generated from another pattern format and eliminating instances of "()" is extra work I'd like to avoid if possible.
For reference, the flags I'm passing to regcomp consist only of REG_EXTENDED. I pass an empty set of flags (0) to regexec.