I was asked today if there was a library to take a list of strings and to compute the most efficient regex to match only those strings. I think it's an NP Complete problem by itself, but I think we can refine the scope a bit.
How would I generate and simplify a regex to match a subset of hosts from a larger set of all hosts on my network? (Knowing that I might not get the most efficient regex.)
The first step is easy. From the following list;
- appserver1.domain.tld
- appserver2.domain.tld
- appserver3.domain.tld
I can concatenate and escape them into
appserver1\.domain\.tld|appserver2\.domain\.tld|appserver3\.domain\.tld
And I know how to manually simplify the regex into
appserver[123]\.domain\.tld
From there I can test that pattern against the full list of hosts and verify that it only matches the selected 3 hosts. What I don't know is how to automate the simplifying process. Are there any libraries (in Perl, Javascript or C#) or common practices?
Thanks
Update I got some awesome perl modules but I would love a front end solution as well. That means Javascript. I've searched around but nobody has ported the perl modules to JS and I'm unsuccessful in finding the language to search for this type of library.