Here is the input data:
*** INVOICE ***
THE BIKE SHOP
1 NEW ROAD, TOWNVILLE,
SOMEWHERE, UK, AB1 2CD
TEL 01234-567890
To: COUNTER SALE No: 243529 Page: 1
Date: 04/06/10 12:00
Ref: Aiden
Cust No: 010000
Here is a regex that works (Options: singleline, ignorewhitespace, compiled) - it matches immediately and the groups are properly populated:
\W+INVOICE\W+
(?<shopAddr>.*?)\W+
To:\W+(?<custAddr>.*?)\W+
No:\W+(?<invNo>\d+).*?
Date:\W+(?<invDate>[0-9/ :]+)\W+
Ref:\W+(?<ref>[\w ]*?)\W+
Cust
As soon as I add the 'N' out of Cust No into the rex, parsing the input hangs forever:
\W+INVOICE\W+
(?<shopAddr>.*?)\W+
To:\W+(?<custAddr>.*?)\W+
No:\W+(?<invNo>\d+).*?
Date:\W+(?<invDate>[0-9/ :]+)\W+
Ref:\W+(?<ref>[\w ]*?)\W+
Cust N
If I add something like "any character" :
\W+INVOICE\W+
(?<shopAddr>.*?)\W+
To:\W+(?<custAddr>.*?)\W+
No:\W+(?<invNo>\d+).*?
Date:\W+(?<invDate>[0-9/ :]+)\W+
Ref:\W+(?<ref>[\w ]*?)\W+
Cust .
It works, but as soon as I add a fixed character, the rex hangs again:
\W+INVOICE\W+
(?<shopAddr>.*?)\W+
To:\W+(?<custAddr>.*?)\W+
No:\W+(?<invNo>\d+).*?
Date:\W+(?<invDate>[0-9/ :]+)\W+
Ref:\W+(?<ref>[\w ]*?)\W+
Cust ..:
Can anyone advise why adding something so trivial would cause it to fall over? Can I enable some kind of tracing to watch the matching activity to see if it is getting stuck in a catastrophic backtrack?