views:

95

answers:

3

I'm trying to use the System.IO.Log features to build a recoverable transaction system. I understand it to be implemented on top of the Common Log File System.

The usual ARIES approach to write-ahead logging involves persisting log record sequence numbers in places other than the log (for example, in the header of the database page modified by the logged action).

Interestingly, the documentation for CLFS says that such sequence numbers are always 64-bit integers.

Confusingly, however, the .Net wrapper around those SequenceNumbers can be constructed from a byte[] but not from a UInt64. It's value can also be read as a byte[], but not as a UInt64. Inspecting the implementation of SequenceNumber.GetBytes() reveals that it can in fact return arrays of either 8 or 16 bytes.

This raises a few questions:

  1. Why do the .Net sequence numbers differ in size from the CLFS sequence numbers?
  2. Why are the .Net sequence numbers variable in length?
  3. Why would you need 128 bits to represent such a sequence number? It seems like you would truncate the log well before using up a 64-bit address space (16 exbibytes, or around 10^19 bytes, more if you address longer words)?
  4. If log sequence numbers are going to be represented as 128 bit integers, why not provide a way to serialize/deserialize them as pairs of UInt64s instead of rather-pointlessly incurring heap allocations for short-lived new byte[]s every time you need to write/read one? Alternatively, why bother making SequenceNumber a value type at all?

It seems an odd tradeoff to double the storage overhead of log sequence numbers just so you can have an untruncated log longer than a million terabytes, so I feel like I'm missing something here, or maybe several things. I'd much appreciate it if someone in the know could set me straight.

Clarification

I agree with what Damien and Andras are saying. Those concerns are by far the most likely explanation for the byte[] return type. But the current implementation on top of CLFS has, on inspection of the disassembly, code paths where it creates 64-bit LSNs and code paths where it creates 128-bit LSNs. Why? And can a client using System.IO.Log on top of CLFS safely store LSNs in a fixed-length 64-bit field? 128-bit field? A field of any fixed length?

It's next-door to useless if the LSNs can be of arbitrary length, since you need an LSN field somewhere in the page header to implement physio-logical recovery. If the field is of variable length, then there is a not-insignificant increase in complexity addressing the non-header portion of the page. If there is no bound on the variable length, then you can't even be sure that you will have space on the page to expand the LSN header field without spilling either the header or the page contents to a new page, neither of which is viable in the general case (since the point where you would detect this condition is far less abstract than the point where you would have information about how to perform such a recovery, if the data structure you are storing even permits something of that kind).

+1  A: 

Well, your first link mentions two implementations of the IRecordSequence interface, only one of which is the CLFS based one. And, of course, there could be other, future implementations too. So maybe they're aware of some other systems that use longer sequence numbers, and don't want people to write code that assumes that sequence numbers are always 64 bits.

Damien_The_Unbeliever
Could be, but the interface is defined in terms of the SequenceNumber value type, and the implementation of that type has two UInt64's instead of a byte[]. It makes (some degree of) sense for future-proofing, even though many or most clients will want to persist the sequence numbers in fixed length fields, but it doesn't permit current implementations to use more than 64 bits. There's also the matter of the implementation, which does (at least apparently on inspection of the IL) sometimes create 128-bit (or at least 65-bit implemented as 128-bit) values even on top of CLFS.
Doug McClean
+2  A: 

The most obvious reason for this is because UInt64 is not CLS compliant whereas System.IO.Log Assembly is expressly marked as CLSCompliant(true) (open in reflector).

And since the Platform defines the underlying type as ULONGLONG, it's not safe to force the result into an Int64, since half the results would be negative and the result-space would wrap around.

The best solution, therefore, apart from changing the CLS spec to accept unsigned ints, was to adopt a byte array result - which also has the added advantage, as Damien suggests, of future proofing should a future version of windows extends it to returning more bits.

Andras Zoltan
An unchecked cast to/from Int64 would be CLS compliant and safe, but I agree there would be potential for confusion with the negative numbers and someone might assume that comparing the Int64s would give the same result as comparing the LSNs they represent.
Doug McClean
@Doug - yes my assertion that it's not safe: I actually mean that it's misleading and not good practise; so therefore they went for an extra level of indirection over the value, which makes sense to me.If it had been me, I'd probably have (wrongly) sacrificed the CLS compliancy in favour of the UInt64!
Andras Zoltan
+1  A: 

My personal intuition around the variable length LSN is that it wasn't meant for any application to assume that it couldn't predict the size of its LSN (given that it didn't change its provider). As for the actual reason, I suspect it wouldn't be helpful for me to speculate without getting in touch with the folks who know better than me.

Insofar as we can ever say anything with certainty about the future, I think it is safe to say that users of CLFS can assume that its LSNs will not change in length for a reasonable amount of time without a lot of churn in its Win32 API. (I say this as someone who worked on CLFS for a few years.)

I concur that there are a lot of applications in which it would technically suck to have to to support variable-length LSNs.

jrtipton