views:

1715

answers:

3

I implemented the Pattern class as shown here: http://www.java2s.com/Code/Java/GWT/ImplementjavautilregexPatternwithJavascriptRegExpobject.htm

And I would like to use the following regex to match urls in my String:

(http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?

Unfortunately, the Java compiler of course fails on parsing that string because it doesn't use valid escape sequences (since the above is technically a url pattern for JavaScript, not Java)

At the end of the day, I'm looking for a regex pattern that will both compile in Java and execute in JavaScript correctly.

+2  A: 

The pattern itself looks fine, but I guess, its because of Backslash escaping.

Please take a look this http://www.regular-expressions.info/java.html

In literal Java strings the backslash is an escape character. The literal string "\\" is a single backslash. In regular expressions, the backslash is also an escape character. The regular expression \\ matches a single backslash. This regular expression as a Java string, becomes "\\\\". That's right: 4 backslashes to match a single one.

So, if you reuse your Javascript regex in java, you need to replace \ to \\, and vice versa.

S.Mark
Igor Klimer
+4  A: 

You will have to use JSNI to do the regex evaluation part in Javascript. If you do write the regex with the escaped backslashes, that will get converted to Javascript as it is and will obviously be invalid. Thought it will work in the Hosted or Dev mode as thats still running Java bytecode, but not on the compiled application.

A simple JSNI example to test if a given string is a valid URL:

// Java method
public native boolean isValidUrl(String url) /*-{
    var pattern = /(http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?/;
    return pattern.test(url);
}-*/;

There may be other irregularities between the Java and Javascript regex engines, so it's better to offload it completely to Javascript at least for moderately complex regexes.

Anurag
After I posted this I realized this would probably be the better route to take.
Kyle Hayes
+1  A: 

Hi

I don't know exactly how this would help but here is the exact function you requested in Javascript. I guess using JSNI like Anurag said will help.

var urlPattern = "(https?|ftp)://(www\\.)?(((([a-zA-Z0-9.-]+\\.){1,}[a-zA-Z]{2,4}|localhost))|((\\d{1,3}\\.){3}(\\d{1,3})))(:(\\d+))?(/([a-zA-Z0-9-._~!$&'()*+,;=:@/]|%[0-9A-F]{2})*)?(\\?([a-zA-Z0-9-._~!$&'()*+,;=:/?@]|%[0-9A-F]{2})*)?(#([a-zA-Z0-9._-]|%[0-9A-F]{2})*)?";

function isValidURL(url) {

    urlPattern = "^" + urlPattern + "$";
    var regex = new RegExp(urlPattern);

    return regex.test(url);

}

Like what @S.Mark said, I basically took the "java" way of doing Regular Expression in Javascript.

In Java, you would just done it the following way (see how the expression is the same).

String urlPattern = "(https?|ftp)://(www\\.)?(((([a-zA-Z0-9.-]+\\.){1,}[a-zA-Z]{2,4}|localhost))|((\\d{1,3}\\.){3}(\\d{1,3})))(:(\\d+))?(/([a-zA-Z0-9-._~!$&'()*+,;=:@/]|%[0-9A-F]{2})*)?(\\?([a-zA-Z0-9-._~!$&'()*+,;=:/?@]|%[0-9A-F]{2})*)?(#([a-zA-Z0-9._-]|%[0-9A-F]{2})*)?";

Hope this helps. PS, this Regular expression works and even validates sites pointing to localhost:port) where port is any digit port number.

The Elite Gentleman
All of these answers are excellent and I appreciate the help a lot. The business rules are to allow the users to use urls in their text where they can type notes for something. We then go over that and convert the links to hyperlinks (wrapping them in the anchor tags simply), stripping other html then displaying the text as HTML. So I actually think we'll even modify these regexs to not require protocols. Thanks again for the help!
Kyle Hayes
I just did the same thing....Check my final solution here: http://stackoverflow.com/questions/2099892/extracting-1-or-more-hyperlinks-from-paragraph-text-in-javascript-using-regular-e
The Elite Gentleman