tags:

views:

120

answers:

5

I have the 4 bytes that represent an integer stored in 2 separate byte arrays. I would like to convert these into an Int32 WITHOUT copying to a third byte array and reading that using memorystream.

The reason the data is split across two byte arrays is because this is a simplified example of my issue which involves huge amounts of data that cannot fit into a single bytearray.

Is there any way to achieve this? I do not wish to concatenate the two byte arrays into a thrid because of the performance implications which are critical to me.

Moon

+1  A: 

Something like this?

int x = (array1[index] << 16) + array2[index];

Of course, you didn't specify a language, but that's the gist of it.

tylerl
Does this solution assume 2 bytes in each array?What I am trying to do is "read" across array boundaries. So I might have 1 byte at end of first array and 3 at start of second. OR 3 at end of first array and 1 at start of second.Is there a way to make your solution more general? It is c# by the way.
ManInMoon
This is C, and therefore apparently not relevant to your question.
tylerl
Shame - is there no c# version?
ManInMoon
There is obviously
oneat
@ManInMoon I would say my answer in C# is the same as tyrlel. You just need some casting and `BitConverter` in C# so you avoid sign-extensions.
lasseespeholt
Would you be able to give me a simple example of how to use it?
ManInMoon
@ManInMoon Look at my answer. I've added an extra line...
lasseespeholt
+4  A: 

You can use a struct layout like this

[StructLayout(LayoutKind.Explicit, Size=4)]
struct UnionInt32Value
{
[FieldOffset(0)] public byte byte1;
[FieldOffset(1)] public byte byte2;
[FieldOffset(2)] public byte byte3;
[FieldOffset(3)] public byte byte4;
[FieldOffset(0)] public Int32 iVal;
}

Assign your bytes in the correct order then read your Int32 from iVal;

EDIT: Sample code

using System;
using System.Runtime.InteropServices;
namespace Test
{
 class Program
 {
  [StructLayout(LayoutKind.Explicit, Size=4)]
  struct UnionInt32Value
  {
   [FieldOffset(0)] public byte byte1;
   [FieldOffset(1)] public byte byte2;
   [FieldOffset(2)] public byte byte3;
   [FieldOffset(3)] public byte byte4;
   [FieldOffset(0)] public Int32 iVal;
  }
  public static void Main(string[] args)
  {
   UnionInt32Value v = new UnionInt32Value();
   v.byte1=1;
   v.byte2=0;
   v.byte3=0;
   v.byte4=0;
   Console.WriteLine("this is one " + v.iVal);

   v.byte1=0xff;
   v.byte2=0xff;
   v.byte3=0xff;
   v.byte4=0xff;
   Console.WriteLine("this is minus one " + v.iVal);

   Console.Write("Press any key to continue . . . ");
   Console.ReadKey(true);
  }
 }
}
renick
This looks interesting but I have not used this kind of approach before. Would you mind giving me very simple example of how it would be used with two bytearrays, 1 byte in first array and 3 in the second. I would be most grateful.
ManInMoon
I have been reading up on structs and fieldoffset. To use this I would still need to copy the bytes from my existing arrays into the struct right? So that would probably be very slow? Or can "struct" just point to offsets in my arrays?
ManInMoon
There is no way to do what you need to do without some sort of copying. This is probably as efficient as you can get in C#. You only use one struct as temporary buffer. Anyway I would worry about performance when the issue arises not before.
renick
@ManInMoon: You need to "copy" the bytes *somehow*, otherwise you can't get an int. Copying bytes like this is probably the most effective way..
Patrick
Thanks you for the example - that really helps. At the moment I read my bytearray with MemoryStream.readInt32. I believe this does NOT make a copy. The performance is already and issue as my data is 10G in size. If instead of putting sinle bytes in struct I put bytearrays - would this still work? and would it sort of "pass" by ref so that there would be no copying?
ManInMoon
@ManInMoon: Since you have three bytes of every int in one bytearray and one in the other bytearray how does the readInt32 work ? I would think it skips bytes unless your data is not contiguous in the byte array ? Can you show some data ?
renick
I read data from disc into bytearray. Now adta is much lareg than single bytearray max size, so I read x bytes, then start putting next x bytes in second array. However, records are variable size - so I don't know where I am split the data until I actually process it. When I process it I know I am looking for say an int£". Then I check to see how many bytes are left to read in current array (say 1) therfore I need to concat 3 bytes from next array. and Carry on from there. I currently do this by wriing to a memorystream, and then reading from it. However, this means I have to copy the data.
ManInMoon
If your data is on the disk performance is governed by Disk I/O throughput. Is your CPU utilization moving towards 100% in any core when the program runs ?
renick
I read entire adata into bytearray in memory - I have 32G available.
ManInMoon
All the processing is done in-memory - that is key for me. So I read memory - because so far I have read from a single bytearray I have not "skipped" bytes. However, now that I have had to split my data across several bytearrays - that could be an issue - it is reading across such boundaries that I am trying to solve. That is why I am trying to "read" across two prts of two separate bytearrays.
ManInMoon
Ok I see. Well I cannot think of something faster in C#. Copying is minimal. Maybe you should consider writing parts in native code (C)
renick
+1  A: 

The BitConverter class is intended for this:

byte[] parts = { byte1, byte2, byte3, byte4 };
int value = BitConverter.ToInt32(parts, 0);
Guffa
+1  A: 

You can use BitConverter twice, like:

byte[] bytes0 = new byte[] { 255, 255 };
byte[] bytes1 = new byte[] { 0, 0 };

int res = BitConverter.ToInt16(bytes0, 0) << 16;
res |= BitConverter.ToUInt16(bytes1, 0);

Which yields -65536 (0b11111111 11111111 00000000 00000000)

If your integer parts isn't at position 0 in the array, you just replace the 0 in ToUint16 to change the position.

Little extension method:

public static class BitConverterExt
{
    public static int ToInt32(byte[] arr0, int index0, byte[] arr1, int index1)
    {
        int partRes = BitConverter.ToInt16(arr1, index1) << 16;
        return partRes | BitConverter.ToUInt16(arr0, index0);
    }
}

Usage:

byte[] bytes0 = new byte[] { 0x0, 0xA };
byte[] bytes1 = new byte[] { 0x64, 0xFF };

int res = BitConverterExt.ToInt32(bytes0, 0, bytes1, 0);

//Res -10221056 (0xFF640A00)
lasseespeholt
How do you think this would compare speedwise to just copying data to new joint bytearray and directly converting that?
ManInMoon
@ManInMoon I would say it is about the fastest you can do. If you want it faster, you should use pointers in C# (which BitConverter do internally - but it makes some checks which can be redundant) but unless it is REALLY important you shouldn't do that. A good tool to check what happens in various methods is to use Reflector.
lasseespeholt
(I have just changed `res += ...` to `res |= ...` which should be minimally faster and give same result)
lasseespeholt
A: 

If I understand correctly, you are having a problem whilst reading across the boundary of the two arrays. If that is so, this routine will read an integer anywhere in the two arrays, even if it is across the two of them.

    int ReadInteger(byte[] array1, byte[] array2, int offset)
    {
        if (offset < 0 || (offset + 4) > (array1.Length + array2.Length))
            throw new ArgumentOutOfRangeException();

        if (offset <= (array1.Length - 4))
            return BitConverter.ToInt32(array1, offset);
        else if (offset >= array1.Length)
            return BitConverter.ToInt32(array2, offset - array1.Length);
        else
        {
            var buffer = new byte[4];
            var numFirst = array1.Length - offset;

            Array.Copy(array1, offset, buffer, 0,        numFirst);
            Array.Copy(array2, 0,      buffer, numFirst, 4 - numFirst);

            return BitConverter.ToInt32(buffer, 0);
        }
    }

Note: depending on how your integers are stored, you might want to change the order in which bytes are copied.

Mohammad
Hi Mohammad,Yes - this would do the job. BUT I am desparate to avoid the Array.Copy parts! That will add hugely to my processing time. The data is already in memory _ I just want to Read from two place WITHOUT the copy. I would like to do something like:
ManInMoon
return BitConverter.ToInt32(MyJoin(bufferA[2,3],bufferB[7,1]), 0);
ManInMoon
Where MyJoin does not include and copying...
ManInMoon
Is this possible?
ManInMoon
@ManInMoon YES, see my answer which is equivalent to tylerl`s answer
lasseespeholt