If you know about finite automaton you could construct one which represents your regex (there are libraries to do so). Then you run it with your string (1234
) and you would end up in some state. From this state you do a breath-first search until you reach a accept state. While you searching you keep track of which transitions (characters) you run over. And the characters will give you the shortest (or one of them) string which qualify your regex.
Added link: you may have a look on http://www.brics.dk/automaton/ which is a automaton library implemented at Aarhus University (BSD license)
Update: I have build what you seek with the automaton implementation from above. First, the ExtendedOperations class which is in the same package as the other automaton classes because I needed to access some methods.
package dk.brics.automaton;
public class ExtendedOperations {
//Taken from Automaton.run and modified to just return state instead of accept (bool)
static State endState(String s, Automaton a)
{
if (!a.deterministic) a.determinize();
State p = a.initial;
for (int i = 0; i < s.length(); i++) {
p = p.step(s.charAt(i));
if (q == null) return null;
}
return p;
}
public static String getShortestCompletion(Automaton a, String partlyInput)
{
State e = endState(partlyInput, a);
if (e == null) return null;
return BasicOperations.getShortestExample(e, true);
}
}
Second, a little testsample:
package subsetautomaton;
import dk.brics.automaton.*;
public class Main {
public static void main(String[] args) {
RegExp re = new RegExp("[a-zA-Z0-9._%+-]+\\@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,6}");
Automaton a = re.toAutomaton();
System.out.println(ExtendedOperations.getShortestCompletion(a, "a"));
}
}
The example is a naive email address reg. exp. Notice that ^
is implicit in the reg. exp. and the same thing with $
. Second, @
is escaped with \
because it means 'any string' in this implementation.
The result of the example above is: @-.AA