views:

391

answers:

1

Hi!

I'm building a html screen scraper, which parses urls, and then compare those with a set of other urls.

The comparison is done with Uri.AbsoluteUri or Uri.Host.

My problem is that when i'm creating a new Uri (new Uri(url)), an UriFormatException is thrown when the url is to long, or contains to many slashes.

Since my predefined set of urls contains several (to) long urls, I cannot just use substring to only fetch a part of the url.

What would be the best way to handle this?

Thanks

+1  A: 

You can use Uri.TryCreate to check if the URI is valid before you new it.

You should not get an exception on a url this is so short. The folowing program runs well on VS2008:

static void Main(string[] args)
{
 Uri uri = new Uri("http://stackoverflow.com/questions/1298985/c-screen-scraper-handle-long-uris/c-screen-scraper-handle-long-uris/c-screen-scraper-handle-long-uris/c-screen-scraper-handle-long-uris/c-screen-scraper-handle-long-uris/c-screen-scraper-handle-long-uris/c-screen-scraper-handle-long-uris/c-screen-scraper-handle-long-uris/");
 Uri uri2 = new Uri("http://stackoverflow.com/questions/1298985/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/1/");
 Console.ReadLine();
}
Espo
Yes i know, but if that fails, i cannot compare that uri to my list. What i would want is to just "disable" those checks when creating a new Uri.
alexn
How long are your "too long" uris? Are you sure they are actually valid?
Espo
The uri i'm testing is completely valid, only allowed characters. The length is 277 chars.
alexn
See my sample. It's 315 chars long and does not throw an exception. How many slashes does your url have that throws an exception?
Espo
I think i found the problem. The hostname contained about 200 characters.
alexn