tags:

views:

93

answers:

4

I have a long running application written in a mix of C and C++ that stores data in sqlite.

While I am confident that committed data will remain available (barring mechanical failure) and uncommitted data will not be, it's not clear to me what I can do with this sort of middle state.

I do a large number of inserts in a transaction and then commit it. When an error occurs on a given statement, I can schedule it to be attempted at some point in the future. It sounds like some errors might implicitly rollback my transaction (which would be undesirable if true).

A larger problem is what happens when my commit itself fails. Currently, I'm just going to continue to retry it until it works. I would expect that whatever would cause my commit to fail may very well also cause a rollback to fail.

What is the recommended mechanism for error handling in such a situation?

A: 

Inserting null value in a not nullable column is quite unlikely to succeed, no matter how many times you try.

There is no single solution to your problem. For instance if you are violating database constraints you should change your data before making another attempt. If you have no memory/disk space left - free some and retry.

I guess you need to learn more about transactions and ACID properties.

Anton
A: 

Databases are designed explicitly so that data is always either committed or uncommitted. When data is committed, it doesn't get lost arbitrarily because it is on permanent storage (well, in a disk file, which is a good approximation). If you're doing manual transaction management (sounds like it) then the database isn't going to either COMMIT or ROLLBACK until asked to. Errors in statements can be dealt with without losing anything already done.

If you're in a state where you can't either COMMIT or ROLLBACK, then you're in deep trouble. If it's because you're short of memory or disk space (or quota) then you have provisioned the application wrong and you should fix that first.

You also might want to think in terms of limiting the length of transactions so that it is less catastrophic (in terms of work to recover) when you have some failure.

Donal Fellows
Adjusting the size of the transactions is easy enough -- I'm only batching them for performance. At some point, it's possible for me to get into the state where I've sent a bunch of successful statements in, but the final commit may error. This question is mostly sqlite-specific. It's unclear from the docs when and whether repeating the commit is correct.
Dustin
Well, the documentation is made a bit more complex by the fact that things can run in autocommit mode, and also by the fact that some errors are catastrophic. (Running out of disk is an example of that!)
Donal Fellows
I've actually tested the running out of disk space somewhat -- at least for inserts within a transaction. What I've got is more sporadic disk failures. EC2 is a harsh environment. I've got faith that the robustness of sqlite can help me traverse it safely.
Dustin
Most of the possible error codes are catastrophic BTW, yet most are also unlikely to happen (assuming you're using the API correctly and have your provisioning right). You'll have to provide more info about what is *actually* going wrong for you to get much more help than this. And if you're just being ultra-careful when writing the code, trying to cover all bases ahead of time… stop that! You're wasting effort on hypotheticals! The easiest mode of operation is to just ROLLBACK on any error, and let SQLite worry about the rest.
Donal Fellows
Running out of disk is an awkward one. ROLLBACK is your best friend, and I think it's one that can be repeated until it succeeds (assuming you're in a transaction at all). AIUI, it doesn't increase disk usage.
Donal Fellows
I actually had someone tell me they're using my software on a place with intermittently unreliable disk (amazon EBS) that successfully performed an insert statement, but seemingly failed the commit. I didn't error check the commit, so it became a question of what I could actually do when I have IO errors that lead to failed operations such as commit or rollback. The disk can fail, but come back and work later.
Dustin
A: 

Although I disagree with earlier answers in that SQLite most definitely can experience transient errors which will allow you to successfully COMMIT just by retrying a few times, they do have a good point about this not applying to all errors. Don't forget to implement retry limits and perhaps also check the cause of the failure instead of blindly retrying forever.

Dave Sherohman
There is no logical point to give up in my application. The testers use artificial IO subsystem error generators to verify that I retry around errors. Operations has to either attempt to repair the IO or accept that it's not going to happen and take down the whole server.
Dustin
+1  A: 

On COMMIT, if you see an SQLITE_BUSY error, you should reattempt the COMMIT. It might work. Better yet, install a busy-handler callback to handle SQLITE_BUSY.

Another reason a COMMIT might fail is a deferred foreign key violation. If this happens, you might fix the FK violation and then COMMIT. Hard to see an unmanned application doing this though.

Other errors you should probably just give up and ROLLBACK the transaction.

If an IO or OOM error occurs, the current transaction might be rolled back. This is because some IO or OOM errors leave SQLite unsure as to whether or not its internal data structures match what is actually on disk. If we were to proceed at this point the database might become corrupt.

You can test whether or not a transaction has been rolled back by SQLite using the sqlite3_get_autocommit() API.

If an IO or OOM error occurs during COMMIT, the transaction may still have been committed. This could happen, for example, if the user happens to rip the memory card out of a camera just as the transaction is committed. It is not generally possible to know if the data made it to the persistent media without reading the db and checking at the application level.

Dan
This sounds very close to what I'm looking for. What do I do if the rollback itself fails?
Dustin