views:

455

answers:

4

I'm developing a web app using Django, and I'll need to add search functionality soon. Search will be implemented for two models, one being an extension of the auth user class and another one with the fields name, tags, and description. So I guess nothing too scary here in context of searching text.

For development I am using SQLite and as no database specific work has been done, I am at liberty to use any database in production. I'm thinking of choosing between PostgreSQL or MySQL.

I have gone through several posts on Internet about search solutions, nevertheless I'd like to get opinions for my simple case. Here are my questions:

  1. is full-text search an overkill in my case?

  2. is it better to rely on the database's full-text search support? If so, which database should I use?

  3. should I use an external search library, such as Whoosh, Sphinx, or Xapian? If so, which one?

EDIT: tags is a Tagfield (from the django-tagging app) that sits on a m2m relationship. description is a field that holds HTML and has a max_length of 1024 bytes.

+3  A: 

If that field tags means what I think it means, i.e. you plan to store a string which concatenates multiple tags for an item, then you might need full-text search on it... but it's a bad design; rather, you should have a many-many relationship between items and a tags table (in another table, ItemTag or something, with 2 foreign keys that are the primary keys of the items table and tags table).

I can't tell whether you need full-text search on description as I have no indication of what it is -- nor whether you need the reasonable but somewhat rudimentary full-text search that MySQL 5.1 and PostgreSQL 8.3 provide, or the more powerful one in e.g. sphinx... maybe talk a bit more about the context of your app and why you're considering full-text search?

Edit: so it seems the only possible need for full-text search might be on description, and that looks like it's probably limited enough that either MySQL 5.1 or PostgreSQL 8.3 will serve it well. Me, I have a sweet spot for PostgreSQL (even though I'm reasonably expert at MySQL too), but that's a general preference, not specifically connected to full-text search issues. This blog does provide one reason to prefer PostgreSQL: you can have full-text search and still be transactional, while in MySQL full-text indexing only work on MyISAM tables, not InnoDB [[except if you add sphinx, of course]] (also see this follow-on for a bit more on full-text search in PostgreSQL and Lucene). Still, there are of course other considerations involved in picking a DB, and I don't think you'll be doing terribly with either (unless having to add sphinx for full-text plus transaction is a big problem).

Alex Martelli
Alex, I made an edit and gave the information you asked for. Thanks.
shanyu
Alex, thank you for being so helpful. I cannot do without transactions, therefore having MySql with MyISAM is out of question. This leaves me these options: Postgre with its own search functionality, MySql + 3rd party full-text library, Postgre + 3rd party full-text library. I also favor Postgre against Mysql. Then it is Postgre against Postgre + library. What do you suggest?
shanyu
I doubt you need the hassle of installing and maintaining a 3rd party add-on (modest as that hassle may be) given the likely-also-modest need of "fancy" full-text features (what a 3rd party add-on would supply in addition to PostgreSQL 8.3's native features). So, I'd go with "bare" PGSQL 8.3.
Alex Martelli
This settles the topic. Thanks again!
shanyu
A: 

Whether you need an external library depends on your needs. How much traffic are we talking about? The external libraries are generally better when it comes to performance, but as always there are advantages and disadvantages. I am using Sphinx with django-sphinx plugin, and I would recommend it if you will be doing a lot of searching.

I don't think search will be the main thing, therefore I'm more interested in ease of development/deployment rather than performance.
shanyu
A: 

Haystack looks promising. And it supports Whoosh on the back end.

Harold
+1  A: 

Django has full text searching support in its QuerySet filters. Right now, if you only have two models that need searching, just make a view that searches the fields on both:

search_string = "+Django -jazz Python"
first_models = FirstModel.objects.filter(headline__search=search_string)
second_models = SecondModel.objects.filter(headline__search=search_string)

You could further filter them to make sure the results are unique, if necessary.

Additionally, there is a regex filter that may be even better for dealing with your html fields and tags since the regex can instruct the filter on exactly how to process any delimiters or markup.

Soviut
Thanks for the information. Regarding the full-text search support, the doc says: "Note this is only available in MySQL and requires direct manipulation of the database to add the full-text index". Maybe it is not a very good option. On the other hand, regex filter is interesting and certainly worth checking.
shanyu
If you need true full text searching, adding it to mySQL is a matter of configuration.
Soviut