I was posed an interesting question from a colleague for an operational pain point we currently have, and am curious if there's anything out there (utility/library/algorithm) that might help automate this.
Say you have a list of literal values (in our cases, they are URLs). What we want to do is, based on this list, come up with a single regex that matches all of those literal items.
So, if my list is:
http://www.abc.com
http://www.abc.com/subdir
http://foo.abc.com
The simplest answer is
^(http://www.abc.com|http://www.abc.com/subdir|http://foo.abc.com)$
but this gets large for lots of data, and we have a length limit we're trying to stay under.
Currently we manually write the regexes but this doesn't scale very well nor is it a great use of anyone's time. Is there a more automated way of decomposing the source data to come up with a length-optimal regex that matches all of the source values?