OLE Variants, as used by older versions of Visual Basic and pervasively in COM Automation, can store lots of different types: basic types like integers and floats, more complicated types like strings and arrays, and all the way up to IDispatch
implementations and pointers in the form of ByRef
variants.
Variants are also weakly typed: they convert the value to another type without warning depending on which operator you apply and what the current types are of the values passed to the operator. For example, comparing two variants, one containing the integer 1
and another containing the string "1"
, for equality will return True
.
So assuming that I'm working with variants at the underlying data level (e.g. VARIANT
in C++ or TVarData
in Delphi - i.e. the big union of different possible values), how should I hash variants consistently so that they obey the right rules?
Rules:
- Variants that hash unequally should compare as unequal, both in sorting and direct equality
- Variants that compare as equal for both sorting and direct equality should hash as equal
It's OK if I have to use different sorting and direct comparison rules in order to make the hashing fit.
The way I'm currently working is I'm normalizing the variants to strings (if they fit), and treating them as strings, otherwise I'm working with the variant data as if it was an opaque blob, and hashing and comparing its raw bytes. That has some limitations, of course: numbers 1..10
sort as [1, 10, 2, ... 9]
etc. This is mildly annoying, but it is consistent and it is very little work. However, I do wonder if there is an accepted practice for this problem.