views:

92

answers:

4

Hello!

I am building a forum from scratch in PHP. I have used the most of phpBB:s database structure.

But now I am thinking about the search functionality?, what is a good design to be able to search really fast in all posts. I guess there must be some better way than just %query_string% in mysql :)

Maybe explode all sentences into words, let the words be keys in a hash table, and the value is a comma separated list of all the post the word is in? Then there is little more trouble if you delete a post but I think that approach is better.

From start I guess I can use the simple solution, but I dont want to change the code when the forum grows bigger.

Thanks for any ideas or if you can point me to the right direction!

+1  A: 

The best option for me today is sphinx search. It can be used with php, rails, perl and until now for me worked like a charm. You can check a php solution. Craiglist for example use it.

VP
thanks for the tip, I will try this one since Lucene does not support utf8 what I can see.
Jhonte
Zend Lucene rather uses utf8 as the default:http://framework.zend.com/manual/en/zend.search.lucene.best-practice.html#zend.search.lucene.best-practice.encoding
Yuval F
+4  A: 

Zend Lucene is a powerful way to add search to a PHP site.

Here's an article about how to do exactly that: Roll Your Own Search Engine with Zend_Search_Lucene

RichieHindle
+1  A: 

Don't reinvent the wheel. Have a look at Lucene. There is also a port for php:

Zend Lucene

Lucene does the parsing and indexing for you and the queries are fast as lightning.

Louis Haußknecht
A: 

Most forum users will want more than just a string-search. They might not know the exact phrase they need and when they search for "forum search" they would be delighted to find a result for "How to search a forum", which contains the relevant terms but in a different order and separated by other words.

They may also need some fuzzy searching if they don't know the spelling of what they need. They might search for "sequal" and want "sql".

All of this points towards a more complex solution than your like-search.

The most important pointer for now is that whatever you implement, you should make sure it is easy to switch it out in favour of something better later. Make sure your search is hot-swappable as you know you will want to change it later.

Sohnee