views:

65

answers:

1

I have a sentence (words delimited by spaces).

I then have two lists of phrases (full or partial words i.e. contain no spaces): one is an 'include' list and the other is an 'exclude' list.

A matching sentence will contain all phrases in the 'include' list (overlaps are OK, case insensitive) and none of the phrases in the 'exclude' list.

How to test whether the sentence meets the rules? Thanks.

Example

Sentence = This yammy Flybe catalog is sticky

Include list = cat fly tic

Exclude list = veg pot yam

Test fails because, although all the 'include' phrases are in the sentence, one of the 'exclude' phrases (yam) does appear. Change the word yammy to yummy and the test should pass.

P.S. currently using relation division implementation in SQL for this, which seems well optimized when the data is aleady in the SQL database. Now I have a data structure coming from an external source. I suppose I could pass in the delimited strings, split into table rows, etc but I want to investigate other options. So if not regex then what?

+1  A: 
Abel
Excellent! This must be applied to each sentence, yes? Any way of extending it to apply to the whole tab delimited string or is this madness? Thanks.
metaopoly
You can apply it to any string of any length. If you want to apply it to parts of a string individually (where the full set of rules apply to the parts), I suggest to split the string first, and then `foreach` through each element and apply the regex on each (don't worry, the regex will be cached by .NET). It is possible to adjust it such that it works for parts between tabs, but this then becomes a very complex regex which even advanced regex users will find hard to understand.
Abel
I spotted the error but by the time I'd posted an example you'd fixed it, good work! This is the regex I was looking for. I'll remove the 'tabs' requirement to see if anything else language-agnostic turns up
metaopoly
The regex above is "language agnostic" to the extend that any regex-capable language will understand that regex, provided they support look-around (Perl, PHP, Python, Java).
Abel