views:

315

answers:

4

Is there a way to keep a timestamped record of every change to every column of every row in a MySQL table? This way I would never lose any data and keep a history of the transitions. Row deletion could be just setting a "deleted" column to true, but would be recoverable.

I was looking at HyperTable, an open source implementation of Google's BigTable, and this feature really wet my mouth. It would be great if could have it in MySQL, because my apps don't handle the huge amount of data that would justify deploying HyperTable. More details about how this works can be seen here.

Is there any configuration, plugin, fork or whatever that would add just this one functionality to MySQL?

+2  A: 

I do this in a custom framework. Each table definition also generates a Log table related many-to-one with the main table, and when the framework does any update to a row in the main table, it inserts the current state of the row into the Log table. So I have a full audit trail on the state of the table. (I have time records because all my tables have LoggedAt columns.)

No plugin, I'm afraid, more a method of doing things that needs to be baked into your whole database interaction methodology.

chaos
This is an interesting approach, but it would create a copy of every field in a row every time a single field got updated. I don't know if MySQL is smart enough to see the repetition and stop the database from growing so fast, but if it isn't, then I would need a different approach.
obvio171
Yeah, MySQL is not that smart, so this isn't really suitable for heavily updated tables or ones with large amounts of data per row.
chaos
+2  A: 

Create a table that stores the following info...

CREATE TABLE MyData (
    ID INT IDENTITY,
    DataID INT )

CREATE TABLE Data (
    ID INT IDENTITY,
 MyID INT,
    Name VARCHAR(50),
    Timestamp DATETIME DEFAULT CURRENT_TIMESTAMP)

Now create a sproc that does this...

INSERT Data (MyID, Name)
VALUES(@MyID,@Name)

UPDATE MyData SET DataID = @@IDENTITY
WHERE ID = @MyID

In general, the MyData table is just a key table. You then point it to the record in the Data table that is the most current. Whenever you need to change data, you simply call the sproc which Inserts the new data into the Data table, then updates the MyData to point to the most recent record. All if the other tables in the system would key themselves off of the MyData.ID for foreign key purposes.

This arrangement sidesteps the need for a second log table(and keeping them in sync when the schema changes), but at the cost of an extra join and some overhead when creating new records.

William Crim
I don't know how Views work, but would there be a way to make this transparent to the application by making MyData look like (ID, Name), where Name would be taken from the latest Data record pointed to by MyData's DataID field (which would be invisible to the application)?
obvio171
This solution still runs into the same size problem as chaos' though, because unchanged fields are still copied over and over everytime a field gets updated.
obvio171
This solution allows you to add multiple tables to create a single composite record using a view. There is no reason to simply have 2 tables. You could have table for just fields that change frequently, and another for fields that change rarely. Since the Application would only access the data from a view, you could refactor your tables after the fact without impacting your app. Essentially this is the quickest way to replicate the space-saving benefits of a column-store database inside a row-store framework.
William Crim
+2  A: 

I've implemented this in the past in a php model similar to what chaos described.

If you're using mysql 5, you could also accomplish this with a stored procedure that hooks into the on update and on delete events of your table.

http://dev.mysql.com/doc/refman/5.0/en/stored-routines.html

txyoji
A: 

Do you need it to remain queryable, or will this just be for recovering from bad edits? If the latter, you could just set up a cron job to back up the actual files where MySQL stores the data and send it to a version control server.

Shadow
Queryable, yes. And I want the history of each edit to each field in a row. Setting up a cron job would only give me a snapshot of the whole database from time to time, with many changes between each.
obvio171
I see. I wasn't sure if there was any pre-made way to trigger the file copy. (like with svn and post-commit hooks)
Shadow
I guess I didn't explain it very well - I was envisioning update triggers file copy, then use cron to send them to the source control server in sequence at normal intervals.
Shadow
Hm, I see. But I guess triggering a file copy of the whole database every time there is an update would be prohibitively expensive save for the smallest of datasets, wouldn't it?
obvio171
Yea, thats right. Also, its not queryable, so my idea is out the window anyway :)
Shadow