tags:

views:

1188

answers:

9

I would like to get a byte[] from a float[] as quickly as possible, without looping through the whole array (via a cast, probably). Unsafe code is fine. Thanks!

I am looking for a byte array 4 time longer than the float array (the dimension of the byte array will be 4 times that of the float array, since each float is composed of 4 bytes). I'll pass this to a BinaryWriter.

EDIT: To those critics screaming "premature optimization": I have benchmarked this using ANTS profiler before I optimized. There was a significant speed increase because the file has a write-through cache and the float array is exactly sized to match the sector size on the disk. The binary writer wraps a file handle created with pinvoke'd win32 API. The optimization occurs since this lessens the number of function calls.

And, with regard to memory, this application creates massive caches which use plenty of memory. I can allocate the byte buffer once and re-use it many times--the double memory usage in this particular instance amounts to a roundoff error in the overall memory consumption of the app.

So I guess the lesson here is not to make premature assumptions ;)

+2  A: 

Although you can obtain a byte* pointer using unsafe and fixed, you cannot convert the byte* to byte[] in order for the writer to accept it as a parameter without performing data copy. Which you do not want to do as it will double your memory footprint and add an extra iteration over the inevitable iteration that needs to be performed in order to output the data to disk.

Instead, you are still better off iterating over the array of floats and writing each float to the writer individually, using the Write(double) method. It will still be fast because of buffering inside the writer. See sixlettervariables's numbers.

Cheers, V.

vladr
Not sure what you mean. I just want byte-level indexing into the floating-point array (actually, I'm passing the array to a Writer).
Nick
@Vlad: What is this supposed to mean? How can a datatype not be representable as bytes? See my answer.
ryeguy
it means that the binary representation of (float)0 and that of (byte)0 are not the same (for one they don't have the same size.)
vladr
Doesn't seem to work: error CS1503: Argument '1': cannot convert from 'byte*' to 'byte[]'
Nick
Vlad is correct, you cannot fake the bits in memory that consitute a float[] as a byte[]. You CAN get a byte* to the front of the arry which is likely sufficient for your needs but a byte* cannot be magiked into a byte[]
ShuggyCoUk
Please see my edit which explains why, in my specific case, Jeremy's answer does indeed speed up execution as confirmed by a profiler.
Nick
Actually you CAN fake the bits in memory to represent a byte[]. Check out my answer to see how it's done.
Omer Mor
+3  A: 

If you do not want any conversion to happen, I would suggest Buffer.BlockCopy().

public static void BlockCopy(
    Array src,
    int srcOffset,
    Array dst,
    int dstOffset,
    int count
)

For example:

float[] floatArray = new float[1000];
byte[] byteArray = new byte[floatArray.Length * 4];

Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
Jeremy
This will double the amount of memory allocation *in addition* to iterating over your *two* arrays (once to copy, once to write). Very inefficient both speed-wise and memory-wise. Not recommended.
vladr
Doesn't the last parameter need to be multiplied by sizeof(float)?
jdmichal
@jdmichal - Yes, you are correct.
Jeremy
Actually, you should probably just use Buffer.ByteLength: http://msdn.microsoft.com/en-us/library/system.buffer.bytelength.aspx
jdmichal
You are better off to just iterate over the float[] array and call Write for each float. This solution is highly inefficient.
vladr
Didn't know about that method, thanks! As for efficiency, whenever I have used BlockCopy, I had a byte[] and needed a float[] so there was no unneeded duplication. Plus if you stick with BlockCopy, you do not need unsafe code which can be advantageous. Pick the best method for your needs.
Jeremy
@Jeremy: I didn't either, until 5 seconds before that comment :)@Vlad: Please just rate it up or down. No need to repeatedly post the same comment (while advertizing for your answer). Let the asker and the users decide what is helpful. That's why the rating system exists.
jdmichal
Posted answer which confirms @Vlad's suspicions
sixlettervariables
I would always use sizeof(float) instead of a hard-coded 4!
rstevens
@rstevens: you would have to use Marshal.SizeOf(typeof(float)), but the CLI standard says sizeof(float) should be 32bits.
sixlettervariables
A: 

An array of floats into an array of bytes? So 2 floats would take 8 bytes? Or have I misunderstood.

Chris S
A: 

Although it basically does do a for loop behind the scenes, it does do the job in one line

byte[] byteArray = floatArray.Select(
                    f=>System.BitConverter.GetBytes(f)).Aggregate(
                    (bytes, f) => {List<byte> temp = bytes.ToList(); temp.AddRange(f); return temp.ToArray(); });
Jacob Adams
+4  A: 

You're better-off letting the BinaryWriter do this for you. There's going to be iteration over your entire set of data regardless of which method you use, so there's no point in playing with bytes.

+11  A: 

Premature optimization is the root of all evil! @Vlad's suggestion to iterate over each float is a much more reasonable answer than switching to a byte[]. Take the following table of runtimes for increasing numbers of elements (average of 50 runs):

Elements      BinaryWriter(float)      BinaryWriter(byte[])
-----------------------------------------------------------
10               8.72ms                    8.76ms
100              8.94ms                    8.82ms
1000            10.32ms                    9.06ms
10000           32.56ms                   10.34ms
100000         213.28ms                  739.90ms
1000000       1955.92ms                10668.56ms

There is little difference between the two for small numbers of elements. Once you get into the huge number of elements range, the time spent copying from the float[] to the byte[] far outweighs the benefits.

So go with what is simple:

float[] data = new float[...];
foreach(float value in data)
{
    writer.Write(value);
}
sixlettervariables
Actual numbers, nice. :)
Jeremy
I have benchmarked this using ANTS profiler before I optimized. There was a significant speed increase because the file has a write-through cache and the float array is exactly sized to match the sector size on the disk. The binary writer wraps a file handle created with win32 API. ;)
Nick
Good good, but I would add that unless you're writing millions of floats or executing this thousands of times, ~200ms is an unimportant number in the grand scheme of program execution.
sixlettervariables
+1  A: 

There is a dirty fast (not unsafe code) way of doing this:

[StructLayout(LayoutKind.Explicit)]
struct BytetoDoubleConverter
{
    [FieldOffset(0)]
    public Byte[] Bytes;

    [FieldOffset(0)]
    public Double[] Doubles;
}
//...
static Double Sum(byte[] data)
{
    BytetoDoubleConverter convert = new BytetoDoubleConverter { Bytes = data };
    Double result = 0;
    for (int i = 0; i < convert.Doubles.Length / sizeof(Double); i++)
    {
        result += convert.Doubles[i];
    }
    return result;
}

This will work, but I'm not sure of the support on mono/newer versions of the clr. The only strange thing is that the array.Length is the bytes length. This can be explained because it looks at the array length stored with the array, and because this array was a byte array that length will still be in byte length. The indexer does think about the Double being 4 bytes large so no calculation necessary there.

I've looked for it some more and it's actually described on msdn, so chances are this will be supported in future versions, not sure about mono though.

Davy Landman
A: 

We have a class called LudicrousSpeedSerialization and it contains the following unsafe method:

 static public byte[] ConvertFloatsToBytes(float[] data)
 {
  int n = data.Length;
  byte[] ret = new byte[n * sizeof(float)];
  if (n == 0) return ret;

  unsafe
  {
   fixed (byte* pByteArray = &ret[0])
   {
    float* pFloatArray = (float*)pByteArray;
    for (int i = 0; i < n; i++)
    {
     pFloatArray[i] = data[i];
    }
   }
  }

  return ret;
 }
+2  A: 

There is a way that avoids memory copying and iteration.

You can use a really ugly hack to temporary change your array to another type using memory manipulation:

public static class ArrayCaster
{
    [StructLayout(LayoutKind.Explicit)]
    private struct Union
    {
        [FieldOffset(0)] public byte[] bytes;
        [FieldOffset(0)] public float[] floats;
    }
    private static readonly int byteId;
    private static readonly int floatId;

    static ArrayCaster()
    {
        byteId = getByteId();
        floatId = getFloatId();
    }

    public static void AsByteArray(this float[] floats, Action<byte[]> action)
    {
        if(floats == null)
        {
            action(null);
            return;
        }
        if(floats.Length == 0)
        {
            action(new byte[0]);
            return;
        }

        var union = new Union {floats = floats};

        union.bytes.fixArray(union.floats.Length * sizeof(float), byteId);
        try
        {
            action(union.bytes);
        }
        finally
        {
            union.bytes.fixArray(union.bytes.Length / sizeof(float), floatId);
        }
    }

    private static unsafe void fixArray(this byte[] bytes, int newSize, int newId)
    {
        fixed (byte* pBytes = bytes)
        {
            var pSize = (int*)(pBytes - 4);
            var pId = (int*)(pBytes - 8);

            *pSize = newSize;
            *pId = newId;
        }
    }

    private static unsafe int getByteId()
    {
        fixed (byte* pBytes = new byte[1])
        {
            return *(int*)(pBytes - 8);
        }
    }

    private static unsafe int getFloatId()
    {
        fixed (float* pFloats = new float[1])
        {
            var pBytes = (byte*) pFloats;
            return *(int*)(pBytes - 8);
        }
    }
}

And the usage is:

var floats = new float[] {0, 1, 0, 1};
floats.AsByteArray(bytes =>
{
    foreach (var b in bytes)
    {
        Console.WriteLine(b);
    }
});
Omer Mor
-1 for being completely non-portable. Have you even tried this on a 64-bit machine?
Gabe
nope - it's a hack. If and when I get access to a 64 bit machine, I might check it out and perhaps adapt it. It is also not future proof. In CLR v.Next it might be completely broken. There is a trade-of here: You can use a more robust solution and pay in performance, or use the fastest way I can think of and live on the edge :-)
Omer Mor