views:

14

answers:

2

I have pages within a site containing a control that uses a query string to provide dynamic data to the user (http://site/pages/example.aspx?id=1).

I can get my content source to index these dynamic pages only if I create a rule which sets the root site (http://site/*) to 'include complex urls' and 'crawl sharepoint content as http content'. This is NOT acceptable as changing the crawling protocol from SharePoint's to HTTP will prevent any metadata from being collected on the indexed items. The managed metadata feature is a critical component to our SharePoint applications.

To dispel any wondering of whether or not this is simply a configuration error on my part refer to http://social.technet.microsoft.com/Forums/en-US/sharepointsearch/thread/4ff26b26-84ab-4f5f-a14a-48ab7ec121d5 . The issue mentioned is my exact problem but the solution is unusable as I mentioned before.

Keep in mind this is for an external publishing site and my search scope is being trimmed using content classes to only include documents/pages (STS_List_850 and STS_ListItem_DocumentLibrary). Creating a new web site content source and adding it to my scope presents 2 problems: duplicate content in scope and no content class defining it that I know of.

What options do I have?

+1  A: 

Just a thought: maybe you should create two data sources, one - SharePoint - for metadata and items and one - HTTP - for the pages. Set rules on each one to exclude the other's content. Would that solve your problem?

Vladi Gubler
Thank you for your response! There are two problems I see with this approach: 1) I cannot set two content sources to crawl the same site 2) if I could I don't know of any way to have the content sources exclude each others' content as the rules are global for all content sources. Also I can't (due to requirements) fragment the data into multiple scopes and leave it up to the user to figure out which one to search on -- since i need one scope I also don't know of a content class for 'http content'. Content classes are required to filter unnecessary results to the external users.
Steve Ruiz
A: 

I have decided to take a different approach to this problem as combining dynamic http content and sharepoint content into one scope is a non trivial problem and is better suited to a entirely new project and not a retrofit as I was attempting.

If you have dynamic content from a separate system which you want to crawl without sacrificing SharePoint metadata information from the rest of your site it seems the only option is to write a BCS application/search connector, crawl the two content sources separately and combine them with a scope and possibly an extended core results webpart. Good luck!

Steve Ruiz