views:

39

answers:

1

I have this regular expression to get urls:

(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9-.]+.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9.\,\;\?\'\+&%\$#\=~_-]+))*

And I want to modify it so that when I call to make an array of the matched strings it will get everything before it as well. How can I do this?

A: 

Prepend ^(.*?) to the regular expression. That will set up a non-greedy match of all characters between the start of the input string and those matched by the rest of your expression.

Ryan M
correct answer! but I don't know if he means what he asked...
youllknow
worked perfectly, thank you
Patrick Gates
It's quite possible that the pattern won't do what he wants, in the larger scheme of things, per Cory Petosky's comment above. As a rule, though, I try to give answers to the question asked. I hope that the answer will help Patrick and others learn more about regexes no matter the context. By all means, let's also discuss whether the question, or the implied approach to URL validation, is a worthwhile one. I agree that it's a tricky problem, and this particular regex doesn't handle international TLDs in a general fashion. However, that's easily fixed, and the pattern might be "good enough".
Ryan M