views:

282

answers:

7

My application needs to poll a MySQL database for new rows. Every time new rows are added, they should be retrieved. I was thinking of creating a trigger to place references to new rows on a separate table. The original table has over 300,000 rows.

The application is built in PHP.

Some good answers, i think the question deserves a bounty.

+3  A: 

You can use the following statement to find out if a new record was inserted in the table:

select max(id) from table_name

replacing the name of primary key and table name in the above statement. Keep the max(id) value in a temporary variable, and retrieve all new records between this and the last saved max(id) value. After fetching the new records, set max(id) value to the one you got from the query.

Samnan
Why not select * from table_name where id > :max
Esben Skov Pedersen
A: 

assuming you have an identify or some other data that always grow, you should keep track on your php application of the last id retrieved.

that'd work for most scenarios. Unless you are into the real time camp, I don't think you'd need any more than that.

eglasius
A: 

I would do something like this. Of course, this is assuming that ID is an incrementing numerical ID. And how you store your "current location" in the database is upto you.

<?
$idFile = 'lastID.dat';

if(is_file($idFile)){
    $lastSelectedId = (int)file_get_contents($idFile);
} else {
    $lastSelectedId = 0;
}

$res = mysql_query("select * from table_name where id > {$lastSelectedId}");

while($row = mysql_fetch_assoc($res)){
    // Do something with the new rows

    if($row['id']>$lastSelectedId){
        $lastSelectedId = $row['id'];
    }
}

file_put_contents($idFile,$lastSelectedId);

?>
Ollie
+3  A: 

For external applications I find using a TimeStamp column is a more robust method that is independent of auto id and other primary key issues

Add columns to the tables such as:

insertedOn TIMESTAMP DEFAULT CURRENT_TIMESTAMP

or to track inserts and updates

updatedOn TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP

In the external application all you need to do is track the last timestamp when you did a poll. Then select from that timestamp forward on all the relevant tables. In large tables you may need to index the timestamp column

TFD
Indexing such field will *always* be beneficial, not only in case of large tables. +1 anyway.
Adam Byrtek
Indexing is usually beneficial. There are plenty of use cases when the index overhead is not worth it. Typically a table that has many inserts and deletes between each TIMESTAMP based select, and the TIMESTAMP based select is performed infrequently
TFD
A: 

I would concurr with TFD's answer about keeping track of a timestamp in an separate file/table and then fetching all rows newer than that. That's how I do it for a similar application.

Your application querying a single row table (or file) to see if a timestamp has changed from the local storage should not be much of a performance hit. Then, fetching new rows from the 300k row table based on timestamp should again be fine, assuming timestamp is properly indexed.

However, reading your question I was curious if Mysql triggers can do system calls, say a php script that would do some heavy lifting. Turns out they can by using the sys_exec() User-Defined Function. You could use this to do all sorts of processing by passing into it the inserted row data, essentially having an instant notification of inserts.

Finally, a word of caution about using triggers to call external applications.

Fanis
A: 

One option might be to use an INSERT INTO SELECT statement. Taking from the suggestions using timestamps to pull the latest rows, you could do something like...

INSERT INTO t2 (
    SELECT * 
    FROM t1 
    WHERE createdts > DATE_SUB(NOW(), INTERVAL 1 HOUR)
);

This would take all of the rows inserted in the previous hour and insert them in to table 2. You could have a script run this query and have it run every hour (or whatever interval you need).

This would drastically simplify your PHP script for pulling rows as you wouldn't need to iterate over any rows. It also gets rid of having to keep track of the last insert id.

The solution Fanis purposed also sounds like it could be interesting as well.

As a note, the select query in the above insert can but adjusted to only insert certain fields. If you only need certain fields, you would need to specify them in the insert like so...

INSERT INTO t2 (field1, field2) (
    SELECT field1, field2 
    FROM t1 
    WHERE createdts > DATE_SUB(NOW(), INTERVAL 1 HOUR)
);
Chris Gutierrez
+1  A: 

Create a PHP Daemon to monitor the MySQL Table File size, if size changes query for new records, if new records found run next process.

I think there is an active PEAR daemon you can easily configure to monitor the MySQL Table file size and kick off your script.

Phill Pafford
I'm not sure for MySQL, but usually table space is allocated in chunks, so that once an allocation has been done, several rows could be added before the need for another allocation arises.
pascal