views:

198

answers:

5

I have the following regex that isn't working. I want to match the string 'www.example.com' but not the string 'http://www.example.com' (or 'anythingwww.example.com' for that matter):

/\bwww\.\w.\w/ig

This is used in JavaScript like this:

text = text.replace(/\bwww\.\w.\w/ig, 'http://$&');

I know the second part of the regex doesn't work correctly either, but it is the http:// part that is confusing me. It will currently match 'http://www.example.com' resulting in output of 'http://htpp://www.example.com'.

+3  A: 

Does this do what you want? The anchor ensures the text starts with www. But obviously this will fail with other subdomains.

text = text.replace(/^www\.\w+\.\w+$/ig, "http://$&");

EDIT: Fixed thanks to Chris Lutz's comment. I did test earlier, but a strange combo of bugs (missing anchor, unescaped dot, etc.) made it seemingly work. I should reiterate that this is fragile anyway.

Matthew Flaschen
No. Depending on your regex implementation, you probably need to escape the '.'s and add a '+' after the '\w's.
Chris Lutz
A: 

You can use the ^ indicator (anchor) to require the text to match to start with www:

echo -e "http://www.example.com\nanythingwww.example.com\nwww.example.com" | grep "^www.example.com"
www.example.com
lothar
+4  A: 

Are you searching for the occurrence of www.example.com in a larger string? Maybe you can be more specific about what you want to match exactly, but something like this may work for you:

text = text.replace(/(\s)(www\.\w+\.\w+)/ig, "$1http://$2");

The problem with \b (which matches word boundaries) is that it also matches between http:// and www, because / is not a word character.

molf
To clarify a bit: \b matches if and only if the character on one side matches \w and the character on the other side matches \W (the imaginary characters before the beginning and after the end of the string match \W.)
Cebjyre
+2  A: 

Perhaps something like this?

text = text.replace(/(^|\s)(www(?:\.\w+){2,})/ig, "$1http://$2");

This will match the URLs in:

But not:

  • "http://www.example.com"
  • "ftp.example.com"
  • "www.com"
Ben Blank
Note to anyone wanting to use the above regex for URLs with a path - the path won't be selected. Try this: /(^|\s)(www(?:\.[-A-Z0-9+
micahwittman
A: 
JP Alioto
JavaScript regexes don't support lookbehinds.
Alan Moore
Unfortunately, Javascript does not support lookbehind expressions.
molf
Blah! Did not know this ... sorry!
JP Alioto