I'm working on a fairly specialized search engine implementation in Perl, it searches (by regex) documents for specifically delimited (a subset of :punct:) strings from a text file. I'm doing the usual search engine indexing tricks, but there's a problem.
Some of the search regex patterns include, by necessity, delimiters used in the file. "Ok, I think to myself, "word proximity, then... easy" ...and that side of the equation is straight forward enough.
The trick is that because the search patterns are regular expressions, I haven't easily determined the specific words that I should go looking for in the indexed data (think "split" if we're talking about more ordinary strings).
Trivial example, "square[\s-]*dance" would match directly on "squaredance" but a proximity match on "square dance" and "square-dance" (since '-' is a delimiter). I need to know, based on the regex, to look for "square" and "dance" separately, but nearby each other.
I'm game for the challenge, but I'd rather use established code. My gut tells me that it'll be an internal hook to the regex engine, but I don't know of anything like that. Any suggestions?