views:

1453

answers:

5

In Javascript I have defined a regular expression and now a user is typing in a string. I want to tell him if his string still could match the RegExp if he continues typing or if he's already on the wrong way. For instance:

var re = /a*b/;

"a".isPrefixOf( re ); // true
"x".isPrefixOf( re ); // false

How could an implementation of isPrefixOf look like?

Update: Thanks for your answers, making the regex prefix-proof, as suggested by brad, seems to be a good workaround. But I'm still trying to find a general solution.

Maybe this way: We create a new regex with the user input followed by .*. This regex describes all words that the user still may enter. If the intersection of this created regex and the original regex is empty then the user is already on the wrong way. If it's not, he's doing fine. For instance:

var re = /a*b/;
var sInput = "a";
var reInput = new RegExp( sInput + ".*" );

reIntersection = re.intersect( reInput );
reIntersection.isEmpty(); // false

intersect() returns a new regex that accepts only word which both re and reInput would accept. The function doesn't exist yet but we can implement it using look-ahead:

RegExp.prototype.intersect = function( pattern2 ) { 
    return new RegExp( '(?=' + this.source  + ')' + pattern2.source );
}

What remains open is the isEmpty() function. How could we check, if a Javascript regex matches any word or if it's empty?

A: 

Hi

First you define your regular expression as: var re = new RegExp(/^(regexp here)$/);

on the onKeypress event, you check the regexp like this:

text.match(regexp) - where the text is the string entered.

Is this clear?

You should read the question more carefully. It does not ask how to call the matcher. It asks how to write the matcher.
A: 

One way of doing this could be to hook to the onKeyUp event of a text box and .test the text against the regular expression. My assumption is of course that you want to do a regular expression matching. I'm not sure if this is exactly what you need, in fact your code:

"a".isPrefixOf( re ); // true

will never match since it's required to also have a subsequent "b" character (you may want to modify the regular expression). For instance, this code will test against any string matching this format:

a-n(n)-b

Here is the code, save it as a page and load it in your browser:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="it">
<body>
    <input type="text" size="20" id="txtData" onkeyup="showResult()" />
    <div id="dvResult" />
</body>
</html>
<script type="text/javascript">
//<![CDATA[

    theRegExp = /^a\-\d{1,2}\-b$/;

    function isPrefixOf( aText, aRegExp )
    {
     return aRegExp.test( aText );
    }

    function showResult()
    {
     res = document.getElementById( "dvResult" );
     res.innerHTML = isPrefixOf( document.getElementById( "txtData" ).value, theRegExp ) ? "Correct" : "Bad input";
    }

//]]>
</script>
Manuel
You should read the question more carefully. It does not ask how to call the matcher. It asks how to write the matcher.
That's why i highlighted that note in bold text!
Manuel
While the question could have been better phrased, I think it is not difficult to understand. What vote do you want me to remove?
You're right, stefan. I'm removing my comment.
Prestaul
+2  A: 

Very interesting question. In my quick search, I didn't find anything predefined (not even in Perl) that solves this problem.

EDIT: Ouch, it seems Java has something similar called hitEnd() -- see Alan M's answer. What hitEnd() does is say that the result of match() (either true or false) might be modified by additional input. The book 'Mastering Regular Expressions" says it's not very reliable though (not sure why, page 392 not available in google books).

Depending on what features of regular expressions you use, a quick hack like writing some sort of prefixes of your regexp:

e.g. for a+a*b+c your prefixes will be:

a+
a+a*
a+a*b+
a+a*b+c

and try to match any of them with your string might work. This quick hack is made difficult if you use the choice operator, if you use the range operator {n,m} or back-references.

That being said, I think the good solution is to slightly modify the matching algorithm.

The matching algorithm normally employed is a backtracking algorithm (which works well in practice, even if worst case behavior is exponential). This algorithm successfully terminates whenever it has reached the end of the regexp (even if not the entire string was consumed). What you need to do is to modify the termination condition such that it also terminates successfully when it has consumed all of the input.

That being said, you'd probably have to actually implement the algorithm in JavaScript. Hopefully this will become part of libraries such as Jquery.

For more references and theory on the algorithm, check this article out:

http://swtch.com/~rsc/regexp/regexp1.html

(even if it makes a case against the backtracking algorithm and suggests a FA based algorithm (but the FA cannot handle back-references)).

+2  A: 

I think your best bet here is to make your Regex prefix-proof. For the example you gave, /a*b/, I think you could probably use /a*b?/.test(userinput). For more complex patterns this could get increasingly difficult, but I still think it can be done by nesting each subexpression in a series of optional quantifiers (?). For instance:

/a*bcd*e/

The prefix regex could be:

/a*(b(c(d*e?)?)?)?/

Its a little messy, but will solve your problem rather well I think.

brad
This is a great idea (you get an up vote), but breaks down quickly for more complex patterns, I think. How would you create a prefix regex for the following pattern? /a(bc|BC)d/
Prestaul
+3  A: 

People seem to be splitting evenly on how they interpret this question, so I'll demonstrate the concept with a Java example.

import java.util.regex.*;

public class Test
{

  public static void main(String[] args) throws Exception
  {
    tryMatch("^a*b+$", "a", "ab", "abc");
  }

  public static void tryMatch(String regex, String... targets)
  {
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher("");
    System.out.printf("%nregex: %s%n", regex);
    System.out.printf("target | matches() | hitEnd()%n");
    for (String str : targets)
    {
      m.reset(str);
      System.out.printf("%-6s | %-9B | %-9B%n",
          str, m.matches(), m.hitEnd());
    }
  }
}

output:

regex: ^a*b+$
target | matches() | hitEnd()
a      | FALSE     | TRUE
ab     | TRUE      | TRUE
abc    | FALSE     | FALSE

Target string "a" doesn't match because the regex requires at least one b, but it could be the prefix of a successful match, so hitEnd() returns true. String "ab" has all that's required for a match, but it would also match if we added more b's to the end, so hitEnd() still returns true. With "abc" the match attempt fails before it reaches the end of the target string, so the regex couldn't match any string that starts with "abc".

As far as I know, Javascript doesn't have anything like Java's hitEnd() method, but it might be possible to fake it. If anyone knows how, it'll be that Flagrant Badass, Steven Levithan.

Alan Moore
upvoted for a good reference to hitEnd