ansaurus

Question

Class Vs Pure Array Representation

Answer 1

+3 A:

You should profile the code when you run it and see where the time is being spent.

Also consider another language that makes it easy to use big ints.

Tim 2009-10-28 18:05:53

+1. Never try to figure out a priori where your code is spending its time. Measure.

David Seiler 2009-10-28 18:13:45

Answer 2

+1 A:

You're using an integer data type to store a single digit, which is part of a really large number???

I think you're lacking some fundamental computer science understanding here.

The numerals 0-9 can be represented in 4 bits. A byte contains 8 bits. So, you can stuff 2 digits into a single byte (there's your first speed up hint).

Now, go look up how many bytes an integer is taking up (hint: it will be way more than you need to store a single digit).

What's killing performance is the use of integers, which is consuming about 4 times as much memory as you should be. Use bytes or, worst case, a character array (2 digits per byte or character) to store the numerals. It doesn't take a whole lot of logic to "pack" and "unpack" the numerals into a byte.

inked 2009-10-28 18:17:55

Come on, how is any part of your answer relevant to my question? I can't believe anyone upvoted your answer. Both methods makes use of integers so they have the same amount of data to read from memory. You solution doesn't suffice anyway.

Qua 2009-10-28 22:20:01

@Qua: I disagree with you that inked's answer is irrelevant. If you are indeed storing big numbers as one digit per 32-bit int, then you could potentially see a huge speed-up with a more efficient implementation, making other concerns less important. In this reimplementation, you might also discover what's causing your current perceived slowdown. Sometimes, you really just can't see the forest through the trees, you know.

P Daddy 2009-10-28 22:57:38

@Qua: I should also point out that so harshly criticizing someone for trying to help you is not a good way to build bridges.

P Daddy 2009-10-28 22:59:39

P Daddy, instead of answering my original question inked starting going on about how I should go learn about computer science and lookup how many numbers can be stored in a certain amount of bits. My original question was performance difference between arrays and classes - not integer vs bit-wise representation of numbers. And for you information the numbers can be stored in an arbitrary base going 1 through a million which ruins the idea of storing the digits in bits.

Qua 2009-10-29 05:08:32

Maybe your original question was myopic? It's like you're asking, "I have this gun, pointed at my head. I have a pile of bullets. Which one do I put into the gun to figure out how it works?" And I'm telling you, "Move the gun away from your head because bad things could happen." But hey, you don't know what you don't know. Just imagine what an operating system would be like with similar programmers. Bloated? Slow? Yeah. Get the picture yet?

inked 2009-10-29 12:05:37

Enlighten/humour me, how would you store numbers of an arbitrary base by setting individual bits without ending up with complex code that wouldnt run faster than using ints after all?

Qua 2009-10-29 21:25:30

Forget it, can't be arsed arguing with people like you. After having looks at your answers it's clear to me that you're repeatably giving answers like this that doesn't provide any benifit to the original question and which often gets no votes or even ends up in a negative score.

Qua 2009-10-29 21:27:41

@Qua: WTF? Why the hostility? inked provided an answer addressing your problem as he perceived it. For free. He didn't harm you, and he didn't call your mother a bad name. There's no reason to be nasty.

P Daddy 2009-11-06 13:18:56

Answer 3

+4 A:

It's not surprising that the second loop is much faster than the first one. What's happening is not that the class is extraordanily slow, it's that the loop is really easy for the compiler to optimise.

As the loop ranges from 0 to test.Length-1, the compiler can tell that the index variable can never be outside of the array, so it can remove the range check when accessing the array by index.

In the first loop the compiler can't do the connection between the loop and the array, so it has to check the index against the boundaries for each item that is accessed.

There will always be a bit of overhead when you encapsulate an array inside a class, but it's not as much as the difference that you get in your test. You have chosen a sitation where the compiler is able to optimise the plain array access very well, so what you are testing is more the compilers capability to optimise the code rather than what you set out to test.

Guffa 2009-10-28 18:19:24

Good point - I completely forgot about the removal of bounds checking on loops involving arrays.

Greg Beech 2009-10-28 18:20:55

If this is really the answer, then why is the loop using a struct instead of a class as fast as using a pure array?

Qua 2009-10-28 22:21:38

@Qua: If you get the same performance using a struct, then you expose the array in a way that the compiler accesses the array directly. It's hard to tell exactly why without seeing the relevant code. The simplified code for the BigNumber class that you posted doesn't contain the indexer that you used in the test code, or perhaps the test code that you posted is not the actual code used...

Guffa 2009-10-28 23:35:09

Answer 4

+1 A:

Without seeing your BigInteger implementation, it is very difficult to tell. However, I have two guesses.

1) Your line, with the array test, can get special handling by the JIT which removes the array bounds checking. This can give you a significant boost, especially since you're not doing any "real work" in the loop

for (int j = 0; j < test.Length; j++) // This removes bounds checking by JIT

2) Are you timing this in release mode, outside of Visual Studio? If not, that, alone, would explain your 6x speed drop, since the visual studio hosting process slows down class access artificially. Make sure you're in release mode, using Ctrl+F5 to test your timings.

Reed Copsey 2009-10-28 18:22:28

Don't confuse not running in the debugger with "release mode". To run a release build, you have to change the build configuration to "Release" (or edit the current configuration to build with optimizations). What you're referring to is instead the difference between running with and without the debugger attached.

P Daddy 2009-10-28 22:32:30

Yes. I'm saying that just setting optimizations on is not enough. You also have to run outside of the VS test host to get good timings (which is Ctrl+F5 instead of just F5) in addition to being in the "Release" configuration.

Reed Copsey 2009-10-28 22:40:55

I just want to make sure nobody gets confused between running in the debugger and a debug build. I've seen plenty of debug builds deployed to production because of this semantic misunderstanding. To say that Ctrl+F5 (the particular key combination is configurable and not the same on everybody's installation, by the way) runs in "release mode" is inaccurate and confusing.

P Daddy 2009-10-28 22:51:24

No. There are two separate issues. You need to run in a release configuration AND you need to run outside of the test host. You can build a release build with optimizations, and still get poor results if you run within Visual Studio.

Reed Copsey 2009-10-28 23:15:24

But your text doesn't make that clear. "Release mode, outside of Visual Studio" sounds a lot like equating running without debugging with a release build.

P Daddy 2009-10-28 23:44:05

Answer 5

A:

As pointed out by Guffa, the difference is mostly bounds checking.

To guarantee that bounds checking will not ruin performance, you can also put your tight loops in an unsafe block and this will eliminate bounds checking. To do this you'll need to compile with the /unsafe option.

Paul 2009-10-28 22:51:40

Answer 6

+1 A:

Rather than reinventing (and debugging and perfecting) the wheel, you might be better served using an existing big integer implementation, so you can get on with the rest of your project.

This SO topic is a good start.
You might also check out this CodeProject article.

P Daddy 2009-10-28 23:05:35

Yeah I would, however, we're all done with the operations that work on the number arrays. Now we're just wondering whether we should store the array in a class, struct or use direct array references to get optimal performance while stile having a manageable abstraction level.

Qua 2009-10-29 05:13:02

Answer 7

A:

    //pre-load the bits -- do this only ONCE
    byte[] baHi = new byte[16];
    baHi[0]=0;
    baHi[1] = 000 + 00 + 00 + 16; //0001
    baHi[2] = 000 + 00 + 32 + 00; //0010
    baHi[3] = 000 + 00 + 32 + 16; //0011
    baHi[4] = 000 + 64 + 00 + 00; //0100
    baHi[5] = 000 + 64 + 00 + 16; //0101
    baHi[6] = 000 + 64 + 32 + 00; //0110
    baHi[7] = 000 + 64 + 32 + 16; //0111
    baHi[8] = 128 + 00 + 00 + 00; //1000
    baHi[9] = 128 + 00 + 00 + 16; //1001
    //not needed for 0-9
    //baHi[10] = 128 + 00 + 32 + 00; //1010
    //baHi[11] = 128 + 00 + 32 + 16; //1011
    //baHi[12] = 128 + 64 + 00 + 00; //1100
    //baHi[13] = 128 + 64 + 00 + 16; //1101
    //baHi[14] = 128 + 64 + 32 + 00; //1110
    //baHi[15] = 128 + 64 + 32 + 16; //1111

    //-------------------------------------------------------------------------
    //START PACKING
    //load TWO digits (0-9) at a time
    //this means if you're loading a big number from
    //a file, you read two digits at a time
    //and put them into bLoVal and bHiVal
    //230942034371231235  see that '37' in the middle?
    //         ^^
    //

    byte bHiVal = 3; //0000 0011 
    byte bLoVal = 7; //0000 1011 

    byte bShiftedLeftHiVal = (byte)baHi[bHiVal]; //0011 0000  =3, shifted (48)    

    //fuse the two together into a single byte
    byte bNewVal = (byte)(bShiftedLeftHiVal + bLoVal); //0011 1011 = 55 decimal

    //now store bNewVal wherever you want to store it
    //for later retrieval, like a byte array
    //END PACKING

    //-------------------------------------------------------------------------


    Response.Write("PACKING: hi: " + bHiVal + " lo: " + bLoVal + " packed: " + bNewVal);
    Response.Write("<br>");

    //-------------------------------------------------------------------------
    //START UNPACKING
    byte bUnpackedLoByte = (byte)(bNewVal & 15);  //will yield 7
    byte bUnpackedHiByte = (byte)(bNewVal & 240); //will yield 48

    //now we need to change '48' back into '3'
    string sHiBits = "00000000" + Convert.ToString(bUnpackedHiByte, 2); //drops leading 0s, so we pad...
    sHiBits = sHiBits.Substring(sHiBits.Length - 8, 8);                 //and get the last 8 characters
    sHiBits = ("0000" + sHiBits).Substring(0, 8);                       //shift right
    bUnpackedHiByte = (byte)Convert.ToSByte(sHiBits, 2);                //and, finally, get back the original byte

    //the above method, reworked, could also be used to PACK the data,
    //though it might be slower than hitting an array.
    //You can also loop through baHi to unpack, comparing the original
    //bUnpackedHyByte to the contents of the array and return
    //the index of where you found it (the index would be the 
    //unpacked digit)

    Response.Write("UNPACKING: input: " + bNewVal + " hi: " + bUnpackedHiByte + " lo: " + bUnpackedLoByte);

    //now create your output with bUnpackedHiByte and bUnpackedLoByte,
    //then move on to the next two bytes in where ever you stored the
    //really big number
    //END UNPACKING
    //-------------------------------------------------------------------------

Even if you just changed your INT to SHORT in your original solution you'd chop your memory requirements in half, the above takes memory down to almost a bare minimum (I'm sure someone will come along screaming about a few wasted bytes)

inked 2009-10-30 14:19:33

ansaurus

tags:

views:

answers:

Class Vs Pure Array Representation

related questions