views:

42

answers:

2

I am trying to take a string of text and create an array from it so that the string:

var someText='I am some text and check this out!  http://blah.tld/foo/bar  Oh yeah! look at this too: http://foobar.baz';

insert magical regex here and

the array would look like this:

theArray[0]='I am some text and check this out!  '
theArray[1]='http://blah.tld/foo/bar'
theArray[2]='  Oh yeah! look at this too: '
theArray[3]='http://foobar.baz'

I'm at a loss, any help would greatly be appreciated

--Eric

+2  A: 

Split by URL regex (thanks to @Pullet for pointing out a flaw here):

var urlPattern = /(https?\:\/\/\S+[^\.\s+])/;
someText.split(urlPattern);

Let's break down the regex :)

(https?    -> has "http", and an optional "s"
\:\/\/     -> followed by ://
\S+        -> followed by "contiguous" non-whitespace characters (\S+)
[^\.\s+])  -> *except* the first ".", or a series of whitespace characters (\s+)

Running through your sample text gives,

["I am some text and check this out!  ",
"http://blah.tld/foo/bar",
"  Oh yeah! look at this too: ",
"http://foobar.baz",
""]
Anurag
Anurag, thank you so much - that did the trick! Although I'm still struggling to read it!
Eric
@Eric - regexes are fun :), updated with explanation
Anurag
/(https*\:\/\/\S+[^\.\s+])/ would also match httpssss://test.com which is not a valid url.I think what is wanted is /(https?\:\/\/\S+[^\.\s+])/The ? means that the preceding character is optionalThough if you wanted to support other protocols something like the following would also work (depending on the number of protocols you need to support)/((https?|s?ftp|gopher)\:\/\/\S+[^\.\s+])/
Pullets Forever
thanks @Pullets, good catch, made the change. I'll let gopher pass, not sure how many people still use it :0)
Anurag
A: 

Try this:

<script type="text/javascript">
    var url_regex = /((?:ftp|http|https):\/\/(?:\w+:{0,1}\w*@)?(?:\S+)(?::[0-9]+)?(?:\/|\/(?:[\w#!:.?+=&%@!\-\/]))?)+/g;
    var input = "I am some text and check this out!  http://blah.tld/foo/bar  Oh yeah! look at this too: http://foobar.baz";

    var results = input.split(url_regex);
    console.log(results);
</script>

results =

["I am some text and check this out! ",
"http://blah.tld/foo/bar",
" Oh yeah! look at this too: ",
"http://foobar.baz", ""]

You could trim the individual results too, to not have leading and trailing whitespace on the non-url entries.

Samuel Meacham