views:

74

answers:

3

I'm considering / working on implementing a search engine for our company's various content types, and am attempting to wrap my head around Lucene (specifically the .net flavor).

For the moment, my primary question is whether or not documents one indexes have to contain the same fields.

For instance:

Document1:

  • Title: "I'm a document, baby"
  • Body: "Here are some important things"
  • Latitude: 26.12224
  • Longtitude: -65.23124
  • Brand: Toshiba

Document2:

  • Title: "Another Document by Me"
  • Body: "Lorem ipsum and all that jazz"
  • Category: Articles
  • Author: Sir Loin

...and so forth

A: 

If you wish to index on a specific field, I guess all documents must have the same fields.

Sands
That's what my intuition has told be thusfar, but I have not been able to find anything that concretely states one way or another.
Matt
+6  A: 

Nothing in lucene forces uniformity.

If you search on a field named 'fred', and not all docs have 'fred,' that search will not find the fredless docs.

bmargulies
You are my new hero.
Matt
A: 

It all depends on how you have indexed your documents in Lucene. All Documents must be added to the Index. You can use IndexWriter or write your own class to do that. Before adding a document to the Index, you should break it up in name value pairs. Subsequently you can query Lucene for these name values using QueryParser. For example, following query will return all documents with the phrases "I'm a document, baby" in the title and "Here are some important things" in the body.

title:("I'm a document, baby") body:("Here are some important things")

I have just shown a simple example but you can create a more powerful search query, in many different ways.

The classes which I have mentioned are from java but .net should be similar.