I would like to be able to compute the set of all characters which may be matched as the first character in a string by a given instance of java.util.regex.Pattern
. More formally, given the DFA equivalent to a certain regular expression, I want the set of all outgoing transitions from the start state.
An example:
Pattern p = Pattern.compile("[abc]def|daniel|chris|\\s+");
Set<Character> first = getFirstSet(p);
The set first
should contain the following elements:
{ 'a', 'b', 'c', 'd', ' ', '\n', '\r', '\t' }
Any ideas? I'm well aware that I could construct the DFA myself and determine the relevant states that way, but I'd like to avoid that kind of hassle (read: it's not worth that much to me). Note that my host language is actually Scala, so I have access to all of the core Scala libs (for what it's worth).