After triple-checking that XML Schema (XSD) regexes really don't support any of the features that would make this task reasonably easy (particularly lookaheads and anchors), I've come up with an approach that seems to work. This is written in free-spacing mode to make it easier to read, but (of course) XSD regexes don't support that either. :-/
/
[^ibs].* |
i(.{0,1} | [^n].* | n[^t].* | nt.+) |
b(.{0,2} | [^y].* | y[^t].* | yt[^e].* | yte.+) |
s(.{0,4} | [^t].* | t[^r].* | tr[^i].* | tri[^n].* | trin[^g].* | tring.+)
/
The first alternative, obviously, matches anything that doesn't start with the initial letter of any of the keywords. Each top-level alternative after that matches all strings that start with the same letter as a keyword but:
- are shorter than the keyword,
- have a different second letter, different third letter, etc., or
- are longer than the keyword.
Although XSD regexes don't support explicit anchors (i.e., ^
, $
, \A
, \z
), all matches are implicitly anchored at both ends. If the list of keywords is long, you might run up against a limit on the sheer length of the regex. Barring that (and much to my surprise), it looks like this job may actually be doable. :-)