tags:

views:

33

answers:

0

Using php / mysql

Hi Guys. I am building an application that allows people to download rather large sums of data as rows from a mysql database. The data exceeded 2 million rows so i sharded the data (this all went fine). The data is collated using geocoded latitude/longitude paring and uses a very complex SQL query to gather the query. To save the server and mulitple queries I saved all the rows into an array and display the results X per page at a time (again all going well and speedy so far).

The problem comes when somebody wants to download data or even query data they have already downloaded (as they pay for the data this cannot happen) so I have came up with 2 solutions and simply wondered which one would be the best load for my server.

Quick note - the rows are looped and written to an XML or a CSV file which is then presented for download.

Solution 1. Any row that is added to the above mentioen file and downlaoded (could be as much as 100,000 at a time) be added to a database with its ID (Primary key of the row), its table shard index or identifier & an identifier of the user that downloaded it (for logging and other purposes) and when the large query (mentioned above) is performed it uses SELECT ..... WHERE ... AND id NOT IN (1,2,3,5....100000) [Made from joining the table in a loop form some identifier query] - the problem is I have no clue on how much that will kill MySql.

Solution 2. The same but using a join to loop the table. (again I have no idea the kind of processing that will require.

All I would like to know if any MySql experts would be so kind is.. Which one of the 2 above solutions would be the least load on the server.

Thanks in advance

Alex