views:

229

answers:

2

Hello,

I'm a part of a team writing an application for embedded systems. The application often suffers from data corruption caused by power shortage. I thought that implementing some kind of transactions would stop this from happening. One scenario would include copying the area of a file before writing to some additional storage (transaction log). What are other possibilities?

+4  A: 

Databases use a variety of techniques to assure that the state is properly persisted.

  1. The DBMS often retains a replicated control file -- several synchronized copies on several devices. Two is enough. More if your're paranoid. The control file provides a few key parameters used to locate the other files and their expected states. The control file can include a "database version number".

  2. Each file has a "version number" in several forms. A lot of times it's in plain form plus in some XOR-complement so that the two version numbers can be trivially checked to have the correct relationship, and match the control file version number.

  3. All transactions are written to a transaction journal. The transaction journal is then written to the database files.

  4. Before writing to database files, the original data block is copied to a "before image journal", or rollback segment, or some such.

  5. When the block is written to the file, the sequence numbers are updated, and the block is removed from the transaction journal.

You can read up on RDBMS techniques for reliability.

S.Lott
+1  A: 

There's a number of ways to do this; generally the only assumption required is that small writes (<4k) are atomic. For example, here's how CouchDB does it:

  • A 4k header contains, amongst other things, the file offset of the root of the BTree containing all the data.
  • The file is append-only. When updates are required, write the update to the end of the file, followed by any modified BTree nodes, up to and including the root. Then, flush the data, and write the new address of the root node to the header.

If the program dies while writing an update but before writing the header, the extra data at the end of the file is discarded. If it fails after writing the header, the write is complete and all is well. Because the file is append-only, these are the only failure scenarios. This also has the advantage of providing multi-version concurrency control with no read locks.

When the file grows too long, simply read out all the 'live' data and write it to a new file, then delete the original.

Nick Johnson
How is writing the header made atomic? What happens if a failure occurs during writing the header?
Pete Kirkham