tags:

views:

201

answers:

5

a string like: 'www.test.com' is good. a string like: 'www.888.com' is good. a string like: 'stackoverflow.com' is good. a string like: 'GOoGle.Com' is good.

why ? because those are valid urls. it does not necessarely matter if they have been registered or not.

now bad strings are:

'goog*d\x' 'manydots...com'

why because you can't register those urls.

if I have a string in java which is supposed to be a good url what's the best way to validate it ?

thanks a lot

+6  A: 

Hi, use UrlValidator from the Apache Commons library. Binary package: http://www.mirrorservice.org/sites/ftp.apache.org/commons/validator/binaries/commons-validator-1.3.1.zip (zip contains .jar files)

Example of usage (Construct a UrlValidator with valid schemes of "http", and "https"):

String[] schemes = {"http","https"}.
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("ftp://foo.bar.com/")) {
   System.out.println("url is valid");
} else {
   System.out.println("url is invalid");
}

prints "url is invalid"

If instead the default constructor is used.

UrlValidator urlValidator = new UrlValidator();
if (urlValidator.isValid("ftp://foo.bar.com/")) {
   System.out.println("url is valid");
} else {
   System.out.println("url is invalid");
}

prints out "url is valid"

Chris Dennett
+1  A: 

I think that new URL(yourString) will do the trick: it is supposed to raise MalformedURLException if url is not compliant (actually on java api it says If the string specifies an unknown protocol, but you can try it anyway):

try
{
   new URL(string);
} catch (MalformedURLException e) {
  //do whatever
}
Jack
The problem with URL is, it will attempt to perform a lookup each time :(
OscarRyz
Yes that is true, but how else (except using regex) can you validate an url?
Shervin
@Shervin: With Commons UrlValidator.
BalusC
@Shervin: with: http://stackoverflow.com/questions/2601780/url-valid-characters-java-to-validate/2601792#2601792
OscarRyz
Again like I said. Those examples must be using regex to check if an url is valid. That is the only way if you don't want to make a connection
Shervin
A: 

You can do this kind of "url validation" through Regular Expressions.

And here is where you can get some good URL regex's (so you don't have to write your own).

rlb.usa
+1  A: 

I also believe you can use the URL in java.net

URL url = new URL("www.google.com");

The api says public URL(String spec) throws MalformedURLException Parameters: spec - the String to parse as a URL. Throws: MalformedURLException - If the string specifies an unknown protocol.

So an exception is thrown if the URL is invalid.

Shervin
+2  A: 

Those examples are hostnames. They're not valid URLs in themselves.

Hostnames are made of .-separated ‘labels’. Each label must be up to 63 characters of letters, digits and hyphens, but a hyphen must not be the first or last character. It is optional to follow the whole hostname with another dot.

You can match this with a pattern like (assuming case-insensitive):

([a-z0-9]|[a-z0-9][a-z0-9\-]{0,61}[a-z0-9])(\.[a-z0-9]|[a-z0-9][a-z0-9\-]{0,61}[a-z0-9])*\.?

However this matches strings like 1.2.3.4 as well, which although they technically could be host/domain names will actually act as direct IP addresses. You may want to allow that. If you do, you may also want to allow IPv6 addresses, which are colon-separated hex; when embedded in a URL, they also have square brackets around them.

And then of course there's IDNA. Nowadays, 例え.テスト is a valid IDNA domain name, corresponding to xn--r8jz45g.xn--zckzah. If you want to allow those you'll need some Unicode support.

Summary: it's quite a bit more difficult than you might think. And that's just hostnames. ‘Validating’ a whole URL is even more work. A simple regex isn't going to hack it. Use a pre-existing library.

bobince
thank bobince. that was really what I was after.
Chez