views:

58

answers:

1

I'm working a on a link checker/broken link finder and I am getting many false positives, after double checking I noticed that many error codes were returning webexceptions but they were actually downloadable, but in some other cases the statuscode is 404 and i can access the page from the browse.

So here is the code, its pretty ugly, and id like to have something more, id say practical. All the status codes are in that big if are used to filter the ones i dont want to add to brokenlink because they are valid links ( i tested them all ). What i need to fix is the structure (if possible) and how to not get false 404.

Thank you!

try
{
   HttpWebRequest request = ( HttpWebRequest ) WebRequest.Create ( uri );
   request.Method = "Head";
   request.MaximumResponseHeadersLength = 32; // FOR IE SLOW SPEED
   request.AllowAutoRedirect = true;
   using ( HttpWebResponse response = ( HttpWebResponse ) request.GetResponse() )
   {
      request.Abort();
   }
   /* WebClient wc = new WebClient();
     wc.DownloadString( uri ); */

   _validlinks.Add ( strUri );
}
catch ( WebException wex )
{
   if (    !wex.Message.Contains ( "The remote name could not be resolved:" ) &&
           wex.Status != WebExceptionStatus.ServerProtocolViolation )
   {
      if ( wex.Status != WebExceptionStatus.Timeout )
      {
         HttpStatusCode code = ( ( HttpWebResponse ) wex.Response ).StatusCode;
         if (
            code != HttpStatusCode.OK &&
            code != HttpStatusCode.BadRequest &&
            code != HttpStatusCode.Accepted &&
            code != HttpStatusCode.InternalServerError &&
            code != HttpStatusCode.Forbidden &&
            code != HttpStatusCode.Redirect &&
            code != HttpStatusCode.Found
         )
         {
            _brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
         }
         else _validlinks.Add ( strUri );
      }
      else _brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
   }
   else _validlinks.Add ( strUri );
}
A: 

You should add a UserAgent header, since many websites require them.

SLaks
what kind of useragent am i suppose to add?
Burnzy
That's up to you. It should probably contain your contact information.
SLaks
did not fix it. this is one of the pages im getting an error on: http://www.sisweb.com/
Burnzy
Open the page in IE and compare IE's request to yours using [Fiddler](http://fiddler2.com). What UserAgent are you using? Try IE's UserAgent and see if that helps.
SLaks
I used the following: MSIE 7.0; Windows NT 6.0Not quite sure, how to use Fiddler
Burnzy
That's completely wrong. Check Fiddler's documentation; it's an invaluable tool for this. Try the following UserAgent: `Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.2; Trident/4.0)`
SLaks
copied the string and it still does not work.
Burnzy
anyone has an idea?
Burnzy
Did you compare the requests using Fiddler?
SLaks