I am trying to come up with a GNU extended regular expression that detects repeated substrings in a string of ascii-encoded bits. I have an expression that works -- sort of. The problem is that it executes really slowly when given a string that could have many solutions
The expression
([01]+)(\1)+
compiles quickly, but takes about a minute to execute against the string
1010101010101010101010101010101010101010101010101010101010
I am using the regex implementation from glibc 2.5-49 ( comes with CentOS 5.5.)
FWIW, the pcre library executes quickly, as in gregexp or perl directly. So the obvious, but wrong, answer is "use libpcre". I cannot easily introduce a new dependency in my project. I need to work within the std C library that comes with CentOS/RHEL.