views:

88

answers:

4

I have a database which holds URL's in a table (along with other many details about the URL). I have another table which stores strings that I'm going to use to perform searches on each and every link. My database will be big, I'm expecting at least 5 million entries in the links table.

The application which communicates with the user is written in PHP. I need some suggestions about how I can search over all the links with all the patterns (n X m searches) and in the same time not to cause a high load on the server and also not to lose speed. I want it to operate at high speed and low resources. If you have any hints, suggestions in pseudo-code, they are all welcomed.

Right now I don't know whether to use SQL commands to perform these searches and have some help from PHP also or completely do it in PHP.

A: 

First I'd suggest that you rethink the layout. It seems a little unnecessary to run this query for every user, try instead to create a result table, in which you just insert the results from that query that runs ones and everytime the patterns change.

Otherwise, make sure you have indexes (full text) set on the fields you need. For the query itself you could join the tables:

SELECT
    yourFieldsHere
FROM
    theUrlTable AS tu
JOIN
    thePatternTable AS tp ON tu.link LIKE CONCAT('%', tp.pattern, '%');
Bobby
A: 

Hi,

I would say that you pretty definately want to do that in the SQL code, not the PHP code. Also searching on the strings of the URLs is going to be a long operation so perhaps some form of hashing would be good. I have seen someone use a variant of a Zobrist hash for this before (google will bring a load of results back).

Hope this helps,

Dan.

Dan Hedges
A: 

Do as much searching as you practically can within the database. If you're ending up with an n x m result set, and start with at least 5 million hits, that's a LOT Of data to be repeatedly slurping across the wire (or socket, however you're connecting to the db) just to end up throwing away most (a lot?) of it each time. Even if the DB's native search capabilities ('like' matches, regexp, full-text, etc...) aren't up to the task, culling unwanted rows BEFORE they get sent to the client (your code) will still be useful.

Marc B
A: 

Hello!

You must optimize your tables in DB. Use a md5 hash. New column with md5, will use index and faster found text.

But it don't help if you use LIKE '%text%'.

You can use Sphinx or Lucene.

Minor