views:

70

answers:

3

Hi,

I have this at the moment, (I found the code on here).

     var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
     someText.replace(exp, "<a href='$1'>$1</a>");  

It will replace any http://URL in someText with a proper <a href>

But i also require it to match www. without the http. I found this RegEx on RegEx Lib.

((http\://|https\://|ftp\://)|(www.))+(([a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(/[a-zA-Z0-9%:/-_\?\.'~]*)?

And i tested in on the RegEx checker site, http://www.nvcc.edu/home/drodgers/ceu/resources/test_regexp.asp

It matches the strings i want. But when i put it into my exp var, JavaScript is blowing up and causing an error.

I even tried newing it up as a new RegExp like so.

var exp = new RegExp(((http\://|https\://|ftp\://)|(www.))+(([a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(/[a-zA-Z0-9%:/-_\?\.'~]*)?);

But the same thing happens.

Any ideas what i am doing wrong?

Thanks, Kohan

+5  A: 

I believe the RegExp constructor takes a string as argument, see here: https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp

So just put quotes around your regexp and it should work fine.

var exp = new RegExp("((http\\://|https\\://|ftp\\://)|(www.))+(([a-zA-Z0-9\\.-]+\\.[a-zA-Z]{2,4})|([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}))(/[a-zA-Z0-9%:/-_\\?\\.'~]*)?");
someText.replace(exp, "<a href='$1'>$1</a>");
David
He also needs to "double escape", by escaping the escape character. For example, wherever you see `\.` you actually need `\\.`.
Andy E
Putting my original expression into new RegExp(" "); causes it to no longer work. But my new reg exp does now do something (not what i was expecting but another problem i suspect). What is the difference between my first expression and the second one that i need to put into the new RegExp()?
Kohan
@Kohan first one is a regex literal - where the regex is enclosed in two slashes like `/regex/`. You don't have to escape backslashes in a regex literal - but you should escape forwards slashes as it is the delimiter. The second one is regex constructor which takes pattern as a string and hence no delimiter is required - neither should you escape forward slashes. But being a string, you gotta escape the backslashes in it
Amarghosh
Excellent, thanks.
Kohan
escaped the slashes n removed the down votes
Amarghosh
A: 

Regular expression in javascript must be surrounded by slashes '/', so it will look like

var expr = /pattern/flags;

for you the corect way is

var exp = /((http\://|https\://|ftp\://)|(www.))+(([a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(/[a-zA-Z0-9%:/-_\?\.'~]*)?/;

If you use the constructor new RegExp(), call it in a form

var expr = new RegExp(pattern [, flags]);

here pattern and flags are string params

var exp = new RegExp("((http\://|https\://|ftp\://)|(www.))+(([a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(/[a-zA-Z0-9%:/-_\?\.'~]*)?");
kalan
You forgot to escape the forward-slashes in the regex literal, and the backslashes in the string version.
Alan Moore
+1  A: 

Okay, you've got the JavaScript syntax straightened out, now let's talk about regex syntax. The colon (:) has no special meaning, so there's no need to escape it. The dot (.) and question mark (?) normally do have special meanings, but not when they appear in a character class (i.e., inside the square brackets).

The hyphen (-) does have special meaning in a character class: it forms ranges, like [a-z] and [0-9]. If you want to include a literal hyphen in a character class, you either escape it with a backslash or place it at the beginning or end of the list. For example, in [a-zA-Z0-9\.-] the final hyphen matches a hyphen, while the other three are used to form ranges. (The backslash in front of the dot is unnecessary, but it doesn't harm anything.)

Now look at [a-zA-Z0-9%:/-_\?\.'~]. The backslashes in front of ? and . are just clutter, but that foruth hyphen is a real problem. It forms a range starting with / and ending with _; if you look at an ASCII character map, you'll see that it includes the digits 0-9 and uppercase letters A-Z, plus

/, :, ;, <, =, >, ?, @, [, \, ], ^, _

...obviously not what the author intended. There's also a lot of unnecessary grouping and duplicate code in that regex, and do you really need to match IP addresses, too? The moral is: don't trust anything you find on RegExLib.com.

Alan Moore
Thanks for the explanation there. +1. Looks like i should read into making my own regex then.
Kohan