views:

660

answers:

6

I am writing algorithms that work on series of numeric data, where sometimes, a value in the series needs to be null. However, because this application is performance critical, I have avoided the use of nullable types. I have perf tested the algorithms to specifically compare the performance of using nullable types vs non-nullable types, and in the best case scenario nullable types are 2x slower, but often far worse.

The data type most often used is double, and currently the chosen alternative to null is double.NaN. However I understand this is not the exact intended usage for the NaN value, so am unsure whether there are any issues with this I cannot foresee and what the best practise would be.

I am interested in finding out what the best null alternatives are for the following data types in particular: double/float, decimal, DateTime, int/long (although others are more than welcome)

Edit: I think I need to clarify my requirements about performance. Gigs of numerical data are processed through these algorithms at a time which takes several hours. Therefore, although the difference between eg 10ms or 20ms is usually insignificant, in this scenario it really does makes a significant impact to the time taken.

+12  A: 

Well, if you've ruled out Nullable<T>, you are left with domain values - i.e. a magic number that you treat as null. While this isn't ideal, it isn't uncommon either - for example, a lot of the main framework code treats DateTime.MinValue the same as null. This at least moves the damage far away from common values...

edit to highlight only where no NaN

So where there is no NaN, maybe use .MinValue - but just remember what evils happen if you accidentally use that same value meaning the same number...

Obviously for unsigned data you'll need .MaxValue (avoid zero!!!).

Personally, I'd try to use Nullable<T> as expressing my intent more safely... there may be ways to optimise your Nullable<T> code, perhaps. And also - by the time you've checked for the magic number in all the places you need to, perhaps it won't be much faster than Nullable<T>?

Marc Gravell
I agree, I think this is a much better alternative than doubles unless you must have long.MaxValue be valid.
BobbyShaftoe
For double or float values NaN or one of the infinities might be used as a "null" value, if you don't need them.
Joey
With regards to checks, Null types required the same number of checks, where I check for a magic number, I check for null. So the perf tests I performed did take that into account.I agree it's not ideal, but in this scenario, performance is no. 1 priority. And in this scenario, the perf difference between operations as simple as int + int and int? + int? is significant.
Ryan
@Ryan - but how is the performance of (i == int.MinValue || j == int.MinValue) ? int.MinValue : (i+j);?
Marc Gravell
@ Marc - but that needs to be compared with (i == null || j == null) ? null : (i+j); which I have tested in isolation and is still much slower.
Ryan
For Nullable<T> you just need to compare to i+j, since the compiler will do the rest. If you are doing this yourself you are doubling the work.
Marc Gravell
Ahh you are correct. Although in actual usage in my algorithms, normally an explicit check is required, as a null often leads to a different path in the algorithm.
Ryan
I just performed a quick simple perf test on (i == int.MinValue || j == int.MinValue) ? int.MinValue : (i+j); versus nullable i+j and to my surprise nullables are ~3 times slower!
Ryan
Values may vary per type; I get ~2... re the explicit check: once you know something has a value (via `!=null` or `.HasValue`), use `GetValueOrDefault()` (not Value or a cast) - this is the fastest route.
Marc Gravell
Performance may vary per type; I get ~2 using int... re the explicit check: once you know something has a value (via `!=null` or `.HasValue`), use `GetValueOrDefault()` (not Value or a cast) - this is the fastest route.
Marc Gravell
I have been using doubles - just tested int and also got ~2x rather than ~3x. Also tested GetValueOrDefault(), and wow it is much faster - had not idea, but very useful - thanks!
Ryan
+2  A: 

I somewhat disagree with Gravell on this specific edge case: a Null-ed variable is considered 'not defined', it doesn't have a value. So whatever is used to signal that is OK: even magic numbers, but with magic numbers you have to take into account that a magic number will always haunt you in the future when it becomes a 'valid' value all of a sudden. With Double.NaN you don't have to be afraid for that: it's never going to become a valid double. Though, you have to consider that NaN in the sense of the sequence of doubles can only be used as a marker for 'not defined', you can't use it as an error code in the sequences as well, obviously.

So whatever is used to mark 'undefined': it has to be clear in the context of the set of values that that specific value is considered the value for 'undefined' AND that won't change in the future.

If Nullable give you too much trouble, use NaN, or whatever else, as long as you consider the consequences: the value chosen represents 'undefined' and that will stay.

Frans Bouma
You are right, and I had been unclear. I had only meant the MinValue etc for those times where there is no NaN - int, long, decimal, DateTime etc. For double/float, NaN is the obvious answer (that I had assumed, from the question).
Marc Gravell
A: 

Partial answer:

Float and Double provide NaN (Not a Number). NaN is a little tricky since, per spec, NaN != NaN. If you want to know if a number is NaN, you'll need to use Double.IsNaN().

See also Binary floating point and .NET.

bendin
As an aside... in most databases, null != null too, so this isn't necessarily unexpected territory... but yes: it is different to how C# handles equality of Nullable<T>.
Marc Gravell
+2  A: 

I am working on a large project that uses NaN as a null value. I am not entirely comfortable with it - for similar reasons as yours: not knowing what can go wrong. We haven't encountered any real problems so far, but be aware of the following:

NaN arithmetics - While, most of the time, "NaN promotion" is a good thing, it might not always be what you expect.

Comparison - Comparison of values gets rather expensive, if you want NaN's to compare equal. Now, testing floats for equality isn't simple anyway, but ordering (a < b) can get really ugly, because nan's sometimes need to be smaller, sometimes larger than normal values.

Code Infection - I see lots of arithmetic code that requires specific handling of NaN's to be correct. So you end up with "functions that accept NaN's" and "functions that don't" for performance reasons.

Other non-finites NaN is nto the only non-finite value. Should be kept in mind...

Floating Point Exceptions are not a problem when disabled. Until someone enables them. True story: Static intialization of a NaN in an ActiveX control. Doesn't sound scary, until you change installation to use InnoSetup, which uses a Pascal/Delphi(?) core, which has FPU exceptions enabled by default. Took me a while to figure out.

So, all in all, nothing serious, though I'd prefer not to have to consider NaNs that often.


I'd use Nullable types as often as possible, unless they are (proven to be) performance / ressource constraints. One case could be large vectors / matrices with occasional NaNs, or large sets of named individual values where the default NaN behavior is correct.


Alternatively, you can use an index vector for vectors and matrices, standard "sparse matrix" implementations, or a separate bool/bit vector.

peterchen
A: 

Maybe the significant performance decrease happens when calling one of Nullable's members or properties (boxing).

Try to use a struct with the double + a boolean telling whether the value is specified or not.

Stefan Schultze
But nullable types are structs already...
Ryan
That's exactly what Nullable<T> does - it is a struct, it has a value (eg. of double type) and a boolean indicating that is has or has not assigned value. And without boxing overhead.
Jozef Izso
Properties (HasValue and Value) are methods internally (get_HasValue and get_Value). So they are subject to boxing (provided that no special compiler magic for Nullable occurs here).
Stefan Schultze
@stefan-mg; boxing is not required on struct methods unless the method is virtual and not overridden (i.e. GetHashCode(), Equals(), ToString() etc).
Marc Gravell