views:

136

answers:

6

I was wondering if it's faster to process data in MySQL or a server language like PHP or Python. I'm sure native functions like ORDER will be faster in MySQL due to indexing, caching, etc, but actually calculating the rank (including ties returning multiple entries as having the same rank):

Sample SQL

SELECT TORCH_ID,
    distance AS thisscore,
    (SELECT COUNT(distinct(distance))+1 FROM torch_info WHERE distance > thisscore) AS rank
    FROM torch_info ORDER BY rank

Server

...as opposed to just doing a SELECT TORCH_ID FROM torch_info ORDER BY score DESC and then figure out rank in PHP on the web server.

+1  A: 

The speed of the count is going to depend on which DB storage engine you are using and the size of the table. Though I suspect that nearly every count and rank done in mySQL would be faster than pulling that same data into PHP memory and doing the same operation.

Pro777
As I said in my original post, COUNT is definitely going to be faster because it's a native SQL function and I guess the DB will be optimized to perform it. However, rank isn't a native function. My question is asking if those non-native functions are faster in MySQL or a language like say python that is specifically built to work with lists.
Karan
A: 

Ranking is based on count, order. So if you can do those functions faster, then rank will obviously be faster.

+1  A: 

MySQL will probably be faster with most non-complex calculations. However, 90% of the time database server is the bottleneck, so do you really want to add to that by bogging down your database with these calculations? I myself would rather put them on the web/application server to even out the load, but that's your decision.

musicfreak
A: 

A large part of your question is dependent on the primary keys and indexes you have set up. Assuming that torchID is indexed properly...

You will find that mySQL is faster than server side code.

Another consideration you might want to make is how often this SQL will be called. You may find it easier to create a rank column and update that as each track record comes in. This will result in a lot of minor hits to your database, versus a number of "heavier" hits to your database.

So let's say you have 10,000 records, 1000 users who hit this query once a day, and 100 users who put in a new track record each day. I'd rather have the DB doing 100 updates in which 10% of them hit every record (9,999) then have the ranking query get hit 1,000 times a day.

My two cents.

Markus
A: 

If your test is running individual queries instead of posting transactions then I would recommend using a JDBC driver over the ODBC dsn because youll get 2-3 times faster performance. (im assuming your using an odbc dsn here in your tests)

djangofan
+1  A: 

In general, the answer to the "Should I process data in the database, or on the web server question" is, "It depends".

  1. It's easy to add another web server. It's harder to add another database server. If you can take load off the database, that can be good.
  2. If the output of your data processing is much smaller than the required input, you may be able to avoid a lot of data transfer overhead by doing the processing in the database. As a simple example, it'd be foolish to SELECT *, retrieve every row in the table, and iterate through them on the web server to pick the one where x = 3, when you can just SELECT * WHERE x = 3
  3. As you pointed out, the database is optimized for operation on its data, using indexes, etc.
Frank Farmer