views:

42

answers:

3

Background:
I have a cclassifieds website, and users may for example search for cars, and specify a price range, mileage, fueltype, gearbox and a manually inputted query-string if they like to put something specific into the search, ex "bmw m3".

Questions:
I am about to move this information to Solr for faster lookup, and wonder if I have to index or store the fields.

The only fields where users can search in is actually the "headline" and the "description" of the classified. They can however as I mention above, specify price ranges for example.

So I guess the "headline" and "description" fields should be indexed right? But should the price field, and any other sub-option fields also be indexed?

In a MySQL query the query-syntax would be something like this, so you can compare to what I am talking about: (may contain errors, forgot exactly how they where written)

  $query="SELECT * FROM cars_category WHERE headline='bmw m3' OR description='bmw m3' AND price BETWEEN 10000 AND 500000 AND fuel='petrol' AND etc etc";

So what do you think, index/store all fields or what?

Is there a method for determining what to store and what to index, or both?

Thanks

PS: Descriptive answers are appreciated

A: 

Index anything except for description. Most databases do NOT use the index when you do a query with a field using "LIKE '%xxx%".

DVK
but it's a full-text field, so I think it should be indexed... Not sure though!
Camran
+2  A: 

I agree with: anything you are going to search or sort on should be indexed.

However searching and sorting typically works better on fields with distinct values (i.e. a Make field, containing "Acura", "BMW", "Chevy", etc.), instead of large free-text fields like Description. You might consider this for better search results and better performance.

In your situation, I would recommend indexing Price, Fuel, Headline and any other distinct fields you are searching on.

An index on Description will only be useful if you search for Description = "BMW M3". However that search logic will omit results such as "Red BMW M3 with Pirelli tyres". A search for Description LIKE "%BMW M3%" will have to scan the entire table anyway, so an index won't be very useful.

DWong
When do I store then? Whats a typical situation where "store" only would be used?
Camran
I may be confused here. By "store" do you mean "cache"?
DWong
Actually, I mean store. There are stored fields and indexed fields in a db.
Camran
I edited my original answer, hopefully that provides more help.
DWong
+1  A: 

and wonder if I have to index or store the fields.

My understanding of Solr is very limited, but what confused me when I started was the indexing terminology: in a database, storing the data and creating/storing an index are two separate things and (generally speaking, at least) data is stored in two separate places, but in Solr, anything you upload to Solr is indexed. So you decide which fields you want Solr to be able to search, and you assign field types and - hey presto - Solr can find data in those field with impressive speed. You can determine how different fields are searched (case sensitive or not, for example) and you can determine range searches and the like: check out the wiki here http://wiki.apache.org/solr/FrontPage#Search_and_Indexing and the query syntax http://wiki.apache.org/solr/SolrQuerySyntax for comprehensive treatment of these.

davek