views:

795

answers:

3

Hi

Can anyone tell me what permissions I need to give to the Content Crawl Account in MOSS 2007?

When I run a crawl of the content I get an error telling me that it does not have permission and to give it full read to the web application, which I tried to no avail.

All the best

+1  A: 

Depends on your environment, see this technet article for reference/insight, it may be related to what groups the account you are using is in...

curtisk
A: 

This might be the same issue I ran into, check out this MS support article:

"You receive error 401.1 (access denied) when you browse a Web site that uses Integrated Authentication and is hosted on IIS 5.1 or IIS 6"

article

Colin
A: 

The loopback bug that is referenced by Colin's article link is definitely a great first place to start. One quick way to determine if the loopback bug is in-play or not is to attempt to hit your site directly from the server hosting it. If you open IE (or your browser of choice) on your MOSS WFE and can access the site, then the loopback bug isn't an issue. Note, too, that the bug only affects sites running on port 80.

Are you seeing any additional exceptions? There is nothing special about the search crawler account. It should be a standard user account with no special permissions except that a Full-Read web application policy should be established for it on each of the web applications within the farm. MOSS normally takes care of this by itself when you assign the account as the default content crawling account within the SSP(s).

Another obscure crawler problem arises if you have one or more site collection's below the root of the URL you are trying to call but don't actually have a site collection at the root URL itself (i.e., a top-level site collection).

For example, MOSS will normally fail to crawl and will report problems if you have sites here:

http://www.testurl.com/sites/samplesite

http://testserver:8000/randomsite

... but don't have the respective top-level site collections in-place here:

http://www.testurl.com

http://testserver:8000

If you're are attempting to crawl a web application that doesn't have a top-level site collection present, my suggestion is to create one there. Without a top-level site in a web application, a number of things fail to work properly: InfoPath forms publishing, metablog API publishing (to publish to a blog), etc. Each of these things attempts to start with the root URL, and they fail when a site collection isn't present.

If creating a top-level site collection isn't an option, you can work around the issue with managed paths. Changing the web applicaiton's (root) managed path from an Explicit inclusion to a Wildcard inclusion should also work.

I hope this helps!

Sean McDonough