In a Sharepoint 2010 installation, we are trying to crawl the content of a small, single-node SharePoint installation. The crawling is partially successful. We are able to retrieve data delivered from the web services (_vti_bin/sitedata.asmx), but when the crawler tries to access the full page contents, it fails. The error message shown in the Crawl Log is:
The crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly.
The error which is logged in the ULS is:
08/27/2010 01:52:02.92 mssdmn.exe (0x0A7C) 0x03E4 SharePoint Server Search HTTP Protocol Handler du54 High CHttpAccessorHelper::InitRequestInternal - unexpected status (500) on request for 'http://staging.dsr.dk/_layouts/error.aspx' Authentication 1. [httpacchelper.cxx:657] d:\office\source\search\native\gather\protocols\http\httpacchelper.cxx
08/27/2010 01:52:02.92 mssdmn.exe (0x0A7C) 0x03E4 SharePoint Server Search PHSts dv44 High CSTS3Accessor::Init: InitRequest failed for URL http://staging.dsr.dk/Pages/Forside.aspx Return error to caller, hr=80041206 [sts3acc.cxx:546] d:\office\source\search\native\gather\protocols\sts3\sts3acc.cxx
08/27/2010 01:52:02.92 mssdmn.exe (0x0A7C) 0x03E4 SharePoint Server Search PHSts dvb1 High CSTS3Accessor::Init fails, Url sts4://staging.dsr.dk/siteurl=/siteid={a78b7d4f-059f-4484-8564-449cd12a97cf}/weburl=/webid={1189e380-76fd-44b7-99a2-ebd4f7245c3d}, hr=80041206 [sts3handler.cxx:312] d:\office\source\search\native\gather\protocols\sts3\sts3handler.cxx
08/27/2010 01:52:02.92 mssdmn.exe (0x0A7C) 0x03E4 SharePoint Server Search PHSts dvb2 High CSTS3Handler::CreateAccessorExD: Return error to caller, hr=80041206 [sts3handler.cxx:330] d:\office\source\search\native\gather\protocols\sts3\sts3handler.cxx
We have configured the system according to _http://support.microsoft.com/kb/896861 (method 1).
We have used Fiddler2 to look at the HTTP traffic, which seems normal, i.e., we can see all the requests to _vti_bin/... But the request shown above, to the sts4 protocol, is not caught by Fiddler2. Hints on how to debug the STS4 traffic would be welcome.
Any suggestions on how to make the crawler successfully crawl the full page contents?
Thank you!
Thomas