views:

220

answers:

4

Let's say I want to index my shop using Solr Lucene.

I have many types of entities : Products, Product Reviews, Articles

How do I get my Lucene to index those types, but each type with different Schema ?

+1  A: 

With Lucene/Solr, each document does not need to set a value for each field. Within the same schema, you can have a set of fields for entity A and another set of fields for entity B and just populate the appropriate field depending on the entity.

With Solr, you also have the option to go multi-core. Each core have its own schema. You could define a core for each entity.

Pascal Dimassimo
+1  A: 

You might want to have 3 indexes called Products, ProductReviews and Articles. Each index can have its own schema. The difference between Lucene and a relational db approach is that a row in a db, roughly translates to a document in Lucene. Note: each document can have its own schema (which is another difference from a relational db).

Mikos
+2  A: 

I recommend creating your index in a way that all of you entities have more or less the same basic fields: title, content, url, uuid, entity_type, entity_sourcename etc. If each of your entities has a unique set of corresponding index field, you'll have hard time constructing query to search all entities simultaneously, and your results view may become a huge mess. If you need some specific fields for a specific entity, then add it and perform special logic for this entity based on its entity_type.

I'm speaking from experience: we're managing an index with over 10 different entities and this approach works like charm.

P.S. A few other simple advices.

  1. Make sure your Lucene document contains all of the necessary data to construct the result and show it to user (so that you don't need to go to the database to construct the result). Lucene queries are generally much faster than database queries.
  2. If you absolutely need to use database to construct your result set (e.g. to apply permissions), use Lucene query first to narrow results, database query second to filter them.
  3. Don't be afraid to add custom fields to some of your documents if you need it: think of Lucene document as of key-value datastore.
buru
+2  A: 

Multi-core is an approach to use with care. With a simple schema like yours, it's a better way to do as buru recommands. That means to find common fields between your different entities, and then fields that will be used only by on or several of them. You can then add a field "type" or "type_id" which will say if your entity is product, a product review...

Doing so will enable you to have an unique index, and to process queries fastly.

Guillaume Lebourgeois