views:

820

answers:

4

I've been a MYSQL user,never tried Postgres .

But MYSQL has bottle neck on fulltext search when the data set is huge.

A: 

You know there is only one answer to this; and that is your just going to have to try it with your own dataset.

Personally I'd be surprised if there is a significant difference, I'd suspect the performance of this sort of thing is limited by IO bandwidth.

Gareth Davis
Actually I/O bandwidth isn't really involved; the whole point of full text search acceleration in a database is not to scan all the data just to find pieces of it. The MySQL full text search does the basic sort of stuff many people want with a standard configuration having minimal tunables. The PostgreSQL implementation is much more complicated, and allows all sorts of tricks to partition the data in order to speed up queries. What you're searching through to satisfy queries and the size of underlying data can get quite disconnected from one another if you put enough work into it.
Greg Smith
+3  A: 

While it's unlikely that you'll find a significant benefit in Postgres over mysql, if can't hurt to test. However, your main issue, full-text search, is better resolved with something like Sphinx or Lucene. I have used Sphinx at work and found it vastly superior to mysql's built-in full text search. It is also quite easy to integrate into existing systems.

also see http://stackoverflow.com/questions/942810/php-mysql-fulltext-search-lucene-sphinx-or my original Question (including refs) about the different full-text search options

Jonathan Fingland
+1. If your dataset is "huge", and search is important, don't try to do it in the database - use a search engine.
nathan
search engine is not as fast as database search.
Shore
How much more performant is sphinx that the default fulltext search of MYSQL?
Shore
@Shore, look at the links provided in my question. especially http://pagetracer.com/2008/02/15/sphinx-and-lucene-search-engines-first-impressions/ and http://whatstheplot.com/blog/tag/lucene/
Jonathan Fingland
@shore and sorry to have to break it to you, but if you are going to do an sql query with where id=1, then yes the database will be faster. if you're doing a full text search, then sphinx will almost certainly be faster.
Jonathan Fingland
+2  A: 

As has been mentioned before, it differs a lot between datasets, workload, and between how you set it up.

For example, GIN based full text indexes are very fast for searching, but very slow for insert/update. GIST based indexes are slower for searching (but still pretty fast), but much faster for insert/update.

If you don't have the need for database functionality, I would also probably look at sphinx or lucene for raw performance. The largest advantage of the integrated fulltext search in PostgreSQL is that it is just that - integrated. It has transaction support. Recovery support. Snapshot support. All those things that are of vital importance to a database. But if you don't need the db functionality, a solution that drops those requirements is likely faster.

Magnus Hagander
+3  A: 

I ran benchmarks a few years ago on large datasets and found that :

  • MySQL FULLTEXT

Is pretty slow. Another drawback is that it forces MyISAM on you which brings a lot of problems. Also index updates are quite slow once the index reaches a certain size : when you insert a new row, a substantial part of the index is re-generated, sometimes a few hundred megabytes of index are rewritten just to insert a forum post. In other words, it's OK for a small forum with a few MBytes of posts, but there is a reason Wikipedia doesn't use it...

  • PostgreSQL fulltext

Is about 10-100x faster than MySQL fulltext, is a lot more powerful, gist is fast on inserts/updates, no problem with locks, in other words it's a totally decent solution.

However searches get slow when the data set is larger than RAM because of MVCC, postgres needs to check the visibility of rows by hitting the heap. Note this may change in a future version. If your query returns 10 rows, no problem. However, if you want to SELECT WHERE (fulltext query) ORDER BY date LIMIT 10 and the fulltext matches 10.000 rows, it can get pretty slow. Still faster than MySQL but not the performance you'd want.

  • Xapian : I tested this, there are also Lucene and Sphinx which have good reputation.

Xapian does not have to conform to the same restrictions as a database, so it can make a lot more opimizations. For instance, it's a single-writer multiple-reader concurrency model, so you'll need some sort of update queue to update your index in the background. It also has its own on-disk format. The result is that it is incredibly fast, even when the dataset is much larger than RAM, and especially on complicated queries matching lots of rows, with sorts, and returning only the most relevant ones.

The index is huge too, it probably contains lots of duplicated stuff. The consequence is that it doesn't need to seek to retrieve the stuff.

Basically once Postgres started to hit the IO-seek wall, MySQL was long dead, and Xapian kept blazing fast.

But it is not as nicely integrated in the database, so it is more work to use. It is only worth it if you have a huge dataset. If this is your case, try it, it's amazing. If your dataset fits in RAM, postgres will just work with a lot less hassle. Also if you want to combine fulltext and database queries, well, integration becomes important.

peufeu