I'm trying to use the System.IO.Log features to build a recoverable transaction system. I understand it to be implemented on top of the Common Log File System.
The usual ARIES approach to write-ahead logging involves persisting log record sequence numbers in places other than the log (for example, in the header of the database page modified by the logged action).
Interestingly, the documentation for CLFS says that such sequence numbers are always 64-bit integers.
Confusingly, however, the .Net wrapper around those SequenceNumber
s can be constructed from a byte[]
but not from a UInt64
. It's value can also be read as a byte[]
, but not as a UInt64
. Inspecting the implementation of SequenceNumber.GetBytes()
reveals that it can in fact return arrays of either 8 or 16 bytes.
This raises a few questions:
- Why do the .Net sequence numbers differ in size from the CLFS sequence numbers?
- Why are the .Net sequence numbers variable in length?
- Why would you need 128 bits to represent such a sequence number? It seems like you would truncate the log well before using up a 64-bit address space (16 exbibytes, or around 10^19 bytes, more if you address longer words)?
- If log sequence numbers are going to be represented as 128 bit integers, why not provide a way to serialize/deserialize them as pairs of
UInt64
s instead of rather-pointlessly incurring heap allocations for short-lived newbyte[]
s every time you need to write/read one? Alternatively, why bother makingSequenceNumber
a value type at all?
It seems an odd tradeoff to double the storage overhead of log sequence numbers just so you can have an untruncated log longer than a million terabytes, so I feel like I'm missing something here, or maybe several things. I'd much appreciate it if someone in the know could set me straight.
Clarification
I agree with what Damien and Andras are saying. Those concerns are by far the most likely explanation for the byte[] return type. But the current implementation on top of CLFS has, on inspection of the disassembly, code paths where it creates 64-bit LSNs and code paths where it creates 128-bit LSNs. Why? And can a client using System.IO.Log on top of CLFS safely store LSNs in a fixed-length 64-bit field? 128-bit field? A field of any fixed length?
It's next-door to useless if the LSNs can be of arbitrary length, since you need an LSN field somewhere in the page header to implement physio-logical recovery. If the field is of variable length, then there is a not-insignificant increase in complexity addressing the non-header portion of the page. If there is no bound on the variable length, then you can't even be sure that you will have space on the page to expand the LSN header field without spilling either the header or the page contents to a new page, neither of which is viable in the general case (since the point where you would detect this condition is far less abstract than the point where you would have information about how to perform such a recovery, if the data structure you are storing even permits something of that kind).