views:

41

answers:

1

In a Sharepoint 2010 installation, we are trying to crawl the content of a small, single-node SharePoint installation. The crawling is partially successful. We are able to retrieve data delivered from the web services (_vti_bin/sitedata.asmx), but when the crawler tries to access the full page contents, it fails. The error message shown in the Crawl Log is:

The crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly.

The error which is logged in the ULS is:

08/27/2010 01:52:02.92 mssdmn.exe (0x0A7C) 0x03E4 SharePoint Server Search HTTP Protocol Handler du54 High CHttpAccessorHelper::InitRequestInternal - unexpected status (500) on request for 'http://staging.dsr.dk/_layouts/error.aspx' Authentication 1. [httpacchelper.cxx:657] d:\office\source\search\native\gather\protocols\http\httpacchelper.cxx
08/27/2010 01:52:02.92 mssdmn.exe (0x0A7C) 0x03E4 SharePoint Server Search PHSts dv44 High CSTS3Accessor::Init: InitRequest failed for URL http://staging.dsr.dk/Pages/Forside.aspx Return error to caller, hr=80041206 [sts3acc.cxx:546] d:\office\source\search\native\gather\protocols\sts3\sts3acc.cxx
08/27/2010 01:52:02.92 mssdmn.exe (0x0A7C) 0x03E4 SharePoint Server Search PHSts dvb1 High CSTS3Accessor::Init fails, Url sts4://staging.dsr.dk/siteurl=/siteid={a78b7d4f-059f-4484-8564-449cd12a97cf}/weburl=/webid={1189e380-76fd-44b7-99a2-ebd4f7245c3d}, hr=80041206 [sts3handler.cxx:312] d:\office\source\search\native\gather\protocols\sts3\sts3handler.cxx
08/27/2010 01:52:02.92 mssdmn.exe (0x0A7C) 0x03E4 SharePoint Server Search PHSts dvb2 High CSTS3Handler::CreateAccessorExD: Return error to caller, hr=80041206 [sts3handler.cxx:330] d:\office\source\search\native\gather\protocols\sts3\sts3handler.cxx

We have configured the system according to _http://support.microsoft.com/kb/896861 (method 1).

We have used Fiddler2 to look at the HTTP traffic, which seems normal, i.e., we can see all the requests to _vti_bin/... But the request shown above, to the sts4 protocol, is not caught by Fiddler2. Hints on how to debug the STS4 traffic would be welcome.

Any suggestions on how to make the crawler successfully crawl the full page contents?

Thank you!

Thomas

A: 

It turned out the hint was lying a little higher up the ULS log:

Unexpected System.FormatException: Input string was not in a correct format. at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal) at System.Number.ParseInt32(String s, NumberStyles style, NumberFormatInfo info) at System.Convert.ToInt32(String value) at DSR.Portal.Core.Service.Identity.IdentityUtility.GetMember(String memberNumberOrCPR) at DSR.Portal.Core.Service.Identity.DSRMembershipProvider.GetUser(String username, Boolean userIsOnline) at DSR.Portal.Core.Service.Identity.DSRMembershipUser.get_Current()

We had implemented a custom MembershipProvider, which was expecting user id’s to be numbers. This failed for Windows Authenticated users, throwing the above stack trace. As a result, the crawler account was not able to retrieve pages, and this caused the problem for the “gatherer”.

So the morale of the story: ALWAYS make sure Windows Authentication works.

Regards

Thomas

Thomas Svensen