Update:
I was quite interested in this topic, so I sat down and implemented it (using this very fast and memory conservative implementation). I also read this one (thanks celion) and found out that you even dont have to split the floats into mantissa and exponent to sort it. You just have to take the bits one-to-one and perform an int sort. You just have to care about the negative values, that have to be inversely put in front of the positive ones at the end of the algorithm (I made that in one step with the last iteration of the algorithm to save some cpu time).
So heres my float radixsort:
public static float[] RadixSort(this float[] array)
{
    // temporary array and the array of converted floats to ints
    int[] t = new int[array.Length];
    int[] a = new int[array.Length];
    for (int i = 0; i < array.Length; i++)
        a[i] = BitConverter.ToInt32(BitConverter.GetBytes(array[i]), 0);
    // set the group length to 1, 2, 4, 8 or 16
    // and see which one is quicker
    int groupLength = 4;
    int bitLength = 32;
    // counting and prefix arrays
    // (dimension is 2^r, the number of possible values of a r-bit number) 
    int[] count = new int[1 << groupLength];
    int[] pref = new int[1 << groupLength];
    int groups = bitLength / groupLength;
    int mask = (1 << groupLength) - 1;
    int negatives = 0, positives = 0;
    for (int c = 0, shift = 0; c < groups; c++, shift += groupLength)
    {
        // reset count array 
        for (int j = 0; j < count.Length; j++)
            count[j] = 0;
        // counting elements of the c-th group 
        for (int i = 0; i < a.Length; i++)
        {
            count[(a[i] >> shift) & mask]++;
            // additionally count all negative 
            // values in first round
            if (c == 0 && a[i] < 0)
                negatives++;
        }
        if (c == 0) positives = a.Length - negatives;
        // calculating prefixes
        pref[0] = 0;
        for (int i = 1; i < count.Length; i++)
            pref[i] = pref[i - 1] + count[i - 1];
        // from a[] to t[] elements ordered by c-th group 
        for (int i = 0; i < a.Length; i++){
            // Get the right index to sort the number in
            int index = pref[(a[i] >> shift) & mask]++;
            if (c == groups - 1)
            {
                // We're in the last (most significant) group, if the
                // number is negative, order them inversely in front
                // of the array, pushing positive ones back.
                if (a[i] < 0)
                    index = positives - (index - negatives) - 1;
                else
                    index += negatives;
            }
            t[index] = a[i];
        }
        // a[]=t[] and start again until the last group 
        t.CopyTo(a, 0);
    }
    // Convert back the ints to the float array
    float[] ret = new float[a.Length];
    for (int i = 0; i < a.Length; i++)
        ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
    return ret;
}
It is slightly slower than an int radix sort, because of the array copying at the beginning and end of the function, where the floats are bitwise copied to ints and back. The whole function nevertheless is again O(n). In any case much faster than sorting 3 times in a row like you proposed. I dont see much room for optimizations anymore, but if anyone does: feel free to tell me.
To sort descending change this line at the very end:
ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
to this:
ret[a.Length - i - 1] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
Measuring:
I set up some short test, containing all special cases of floats (NaN, +/-Inf, Min/Max value, 0) and random numbers. It sorts exactly the same order as Linq or Array.Sort sorts floats: 
NaN -> -Inf -> Min -> Negative Nums -> 0 -> Positive Nums -> Max -> +Inf
So i ran a test with a huge array of 10M numbers:
float[] test = new float[10000000];
Random rnd = new Random();
for (int i = 0; i < test.Length; i++)
{
    byte[] buffer = new byte[4];
    rnd.NextBytes(buffer);
    float rndfloat = BitConverter.ToSingle(buffer, 0);
    switch(i){
        case 0: { test[i] = float.MaxValue; break; }
        case 1: { test[i] = float.MinValue; break; }
        case 2: { test[i] = float.NaN; break; }
        case 3: { test[i] = float.NegativeInfinity; break; }
        case 4: { test[i] = float.PositiveInfinity; break; }
        case 5: { test[i] = 0f; break; }
        default: { test[i] = test[i] = rndfloat; break; }
    }
}
And stopped the time of the different sorting algorithms:
Stopwatch sw = new Stopwatch();
sw.Start();
float[] sorted1 = test.RadixSort();
sw.Stop();
Console.WriteLine(string.Format("RadixSort: {0}", sw.Elapsed));
sw.Reset();
sw.Start();
float[] sorted2 = test.OrderBy(x => x).ToArray();
sw.Stop();
Console.WriteLine(string.Format("Linq OrderBy: {0}", sw.Elapsed));
sw.Reset();
sw.Start();
Array.Sort(test);
float[] sorted3 = test;
sw.Stop();
Console.WriteLine(string.Format("Array.Sort: {0}", sw.Elapsed));
And the output was (update: now ran with release build, not debug):
RadixSort: 00:00:03.9902332
Linq OrderBy: 00:00:17.4983272
Array.Sort: 00:00:03.1536785
roughly more than four times as fast as Linq. That is not bad. But still not yet that fast as Array.Sort, but also not that much worse. But i was really surprised by this one: I expected it to be slightly slower than Linq on very small arrays. But then I ran a test with just 20 elements:
RadixSort: 00:00:00.0012944
Linq OrderBy: 00:00:00.0072271
Array.Sort: 00:00:00.0002979
and even this time my Radixsort is quicker than Linq, but way slower than array sort. :)
Update 2:
I made some more measurements and found out some interesting things: longer group length constants mean less iterations and more memory usage. If you use a group length of 16 bits (only 2 iterations), you have a huge memory overhead when sorting small arrays, but you can beat Array.Sort if it comes to arrays larger than about 100k elements, even if not very much. The charts axes are both logarithmized:
