Suppose you have a large file made up of a bunch of fixed size blocks. Each of these blocks contains some number of variable sized records. Each record must fit completely within a single block and then such records by definition are never larger than a full block. Over time, records are added to and deleted from these blocks as records come and go from this "database".
At some point, especially after perhaps many records are added to the database and several are removed - many of the blocks may end up only partially filled.
What is a good algorithm to shuffle the records around in this database to compact out unnecessary blocks at the end of the file by better filling up the partially filled blocks?
Requirements of the algorithm:
- The compaction must happen in place of the original file without temporarily extending the file by more than a few blocks at most from its starting size
- The algorithm should not unnecessarily disturb blocks that are already mainly full
- Entire blocks must be read or written from/to the file at one time and one should assume the write operation is relatively expensive
- If records are moved from one block to another they must be added at their new location before being removed from their starting position so that in case the operation is interrupted no records are lost as a result of the "failed" compaction. (Assume that this temporary duplication of such records can be detected at recovery).
- The memory that can be used for this operation can only be on the order of perhaps several blocks which is a very small percentage of the overall file size
- Assume that records are on the order of 10 bytes to 1K bytes with an average size of maybe 100 bytes. The fixed sized blocks are on the order of 4K or 8K and that the file is on the order of 1000's of blocks.