ansaurus

Question

Could you please review my Quick Int Parser implementation?

Answer 1

+2 A:

One though; consider renaming the class; calling it Int32 is just asking for trouble, although you get a bit of scope here because it is nested (Numbers.Int32). I wouldn't get too excited about this, though (the nesting does save a lot of pain for the consumer).

Don't call arguments thing like str - that isn't meaningful. That said, int.Parse calls it s, so I can't be too critical.

But I've got to grant you, a 10x performance figure isn't bad ;-p (although it only supports one format, of course).

One other thing; a start and length is a more common pattern than a start and end.

Finally - is there an off-by-one in the ParseSimple checks?

if (end > str.Length) throw new ArgumentOutOfRangeException("end");

Shouldn't that be >=? If length is 1, end can only be 0?

Marc Gravell 2009-01-07 11:27:08

Thanks. I was thinking the name "Numbers.Int32" was the most intuitive, in the sense that methods would be called as "Numbers.Int32.ParseSimple(str)". And if someone wants to import it directly (instead of using the "Number." prefix) they could say "using IntUtil = Numbers.Int32". What do you think?

Hosam Aly 2009-01-07 11:31:53

Perhaps don't worry about the Int32 thing; I've updated - the nesting seems to make it fairly safe. Of course, if Int32 was a *property* that returned a singleton instance of some type called something safer , then it might be clearer (but would use virtcall).

Marc Gravell 2009-01-07 11:40:36

I think I first (and last) saw the property style in NUnit, but I didn't think much about it. But I don't feel it would be useful for this class in any way, would it? (And the virtcall would decrease performance a little, wouldn't it?)

Hosam Aly 2009-01-07 11:44:23

Start and end was all over the STL, and that's where I learned it from. It seemed quicker at first, but now when I look it over I think length would be more usable. I'll probably change it.

Hosam Aly 2009-01-07 11:51:13

I was going to comment on the assertions not checking input in release, but on reflection that *might* be OK for "Trusted" mode - but it will blow up with very unusual exceptions when the input is bad. A tricky one to call... if it isn't *significantly* quicker, I might simply cut this method.

Marc Gravell 2009-01-07 11:53:09

The if condition is not an off-by-one. As said in the method documentation, end should be *after* the last valid index. (Same as in C++, where the end iterator points after the last item.) So end can be equal to length (and that's how other overloads call the method).

Hosam Aly 2009-01-07 12:17:57

... Anyway, I intend to follow your advice and replace "end" with "length", so this won't matter.

Hosam Aly 2009-01-07 12:19:13

Regarding debugging assertions for "Trusted" methods, I explicitly wrote in the documentation that unless the input is *known* to be valid, it should not be used. The assumption is that you "Trust" your input (which probably passed other validations first).

Hosam Aly 2009-01-07 12:21:30

... From a performance point of view, ParseTrusted is almost twice as fast as ParseSimple, and that's why I believe it's valuable.

Hosam Aly 2009-01-07 12:22:15

I have updated the implementation. Would you have time to look at it again? I updated the question to include the update link. Thank you!

Hosam Aly 2009-01-07 13:15:09

May I ask a side question? Do you think the code would have been easier to review if I had published it without the documentation comments?

Hosam Aly 2009-01-08 06:20:51

Answer 2

+1 A:

So if you have parse trusted, then maybe you should consider implementing ParsePositive ParseNegative

And you should be watching the IL while editing your code

erdogany 2009-01-07 11:34:34

Thanks for the suggestion. I'll probably implement ParsePositive and ParseNegative. I've also been watching the IL. Is there anything specific I should be looking at?

Hosam Aly 2009-01-07 11:41:17

Answer 3

+2 A:

One somewhat bizarre thought: rather than subtracting '0' on every iteration, you could have a precalculated amount to subtract at the very end, depending on the number of digits. (length=1, subtract '0'; length=2, subtract ('0'*10 + '0')). You may well only be able to get away with this for relatively short strings though, due to overflow. (Or use a long, but that may slow it down too much.)

Also, I haven't tried it, but you might check whether

result = (result << 3) + result + result + [...]

or

result = (result << 3) + (result << 1) + [...]

is faster than

result = result*10 + [...]

I suspect that it won't be, but it may be worth at least trying. (Certainly bitshifting is faster than multiplying, but I suspect the extra additions involved counterbalance this.)

Have you profiled the code to see where any bottlenecks are?

Do you have any benchmark results of how much faster this is than Int32.Parse, btw?

Jon Skeet 2009-01-07 12:08:47

I shall try to check these. I haven't actually profiled this code.(Partly because I don't have a good profiler. I'd be thankful if you could refer me to a good free one.)

Hosam Aly 2009-01-07 12:24:32

My current benchmarks show that for 25 million iterations over a constant string, my methods are 6 to 12 times faster than Int32.Parse. Marc also got similar results.

Hosam Aly 2009-01-07 12:26:15

I have tried the shift operations, but they didn't provide a performance improvement. On larger strings they were slower, and on shorter strings they were faster, but in both cases the difference was negligible (1% or so).

Hosam Aly 2009-01-07 12:34:09

The subtractions actually enhanced the performance, sometimes by 10%. :) And there is no fear for overflow, the int justifies itself correctly (as long as the bits are concerned). I shall add this to the code. Thanks!

Hosam Aly 2009-01-07 12:43:38

I have updated the implementation. Would you have time to look at it again? I updated the question to include the update link. Thank you!

Hosam Aly 2009-01-07 13:15:43

May I ask a side question? Do you think the code would have been easier to review if I had published it without the documentation comments?

Hosam Aly 2009-01-08 07:17:11

(I couldn't think of anything else to say, btw, hence lack of later comment.) Yes, it would have been easier without doc comments.

Jon Skeet 2009-01-08 09:27:08

Answer 4

+4 A:

I think your implementation is pretty decent. I've made an unsafe version using pointer math, which is indeed faster, but is dirtier in my opinion.

Results (running 30 samples of 10M random int strings):

Int32.Parse                      1766.17 ms (stddev: 1.83 ms)
Numbers.Int32.ParseTrusted       262.07 ms (stddev: 0.44 ms)
ParseTrustedUnsafe               146.13 ms (stddev: 0.43 ms)

here is my unsafe implementation:

public static unsafe int ParseTrustedUnsafe(string str, int start, int end)
{
    unsafe
    {
     Int32 result = 0;
     Int32 length = end - start;
     Boolean isNegative = false;
     fixed (Char* startChar = str)
     {
      Byte* currentChar = ((Byte*)startChar) + (start * 2);
      if (*currentChar == 0x2D)
      {
       isNegative = true;
       currentChar += 2;
       length--;
      }
      else if (*currentChar == 0x2B)
      {
       currentChar += 2;
       length--;
      }
      while (length >= 4)
      {
       result = (result * 10) + (*currentChar - 0x30);
       result = (result * 10) + (*(currentChar + 2) - 0x30);
       result = (result * 10) + (*(currentChar + 4) - 0x30);
       result = (result * 10) + (*(currentChar + 6) - 0x30);

       currentChar += 8;
       length -= 4;
      }
      while (length > 0)
      {
       result = (result * 10) + (*currentChar - 0x30);

       currentChar += 2;
       length -= 1;
      }
     }
     return isNegative ? -result : result;
    }
}

Below is my validation and benchmark program:

static void Main(string[] args)
{
    String[] testValues = new String[10 * 100 * 100 * 100];
    Random rand = new Random();
    for (int i = 0; i < testValues.Length; i++)
    {
     testValues[i] = rand.Next().ToString();
    }
    // validate
    for (int i = 0; i < testValues.Length; i++)
    {
     Debug.Assert(String.CompareOrdinal(testValues[i], ParseTrustedUnsafe(testValues[i], 0, testValues[i].Length).ToString()) == 0, "ParseTrustedUnsafe failed on " + testValues[i]);
     Debug.Assert(String.CompareOrdinal(testValues[i], Numbers.Int32.ParseTrusted(testValues[i], 0, testValues[i].Length).ToString()) == 0, "Numbers.Int32.ParseTrusted failed on " + testValues[i]);
    }
#if DEBUG
    return;
#endif    
    List<List<Double>> results = new List<List<double>>();
    for (int i = 0; i < 3; i++)
    {
     results.Add(new List<double>());
    }

    Thread.CurrentThread.Priority = ThreadPriority.Highest;
    Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.RealTime;
    if (Environment.ProcessorCount > 1)
    {
     Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(1 << (Environment.ProcessorCount - 1)); // use last core only
    }
    Stopwatch sw = new Stopwatch();
    for (int testrun = 0; testrun < 30; testrun++)
    {
     sw.Reset();
     sw.Start();
     for (int i = 0; i < testValues.Length; i++)
     {
      Int32.Parse(testValues[i]);
     }
     sw.Stop();
     results[0].Add(sw.ElapsedMilliseconds);

     sw.Reset();
     sw.Start();
     for (int i = 0; i < testValues.Length; i++)
     {
      Numbers.Int32.ParseTrusted(testValues[i]);
     }
     sw.Stop();
     results[1].Add(sw.ElapsedMilliseconds);

     sw.Reset();
     sw.Start();
     for (int i = 0; i < testValues.Length; i++)
     {
      ParseTrustedUnsafe(testValues[i], 0, testValues[0].Length);
     }
     sw.Stop();
     results[2].Add(sw.ElapsedMilliseconds);

    }
    Console.WriteLine("Int32.Parse \t\t\t {0:0.00}ms (stddev: {1:0.00}ms)", results[0].Average(), StandardDeviation(results[0]));
    Console.WriteLine("Numbers.Int32.ParseTrusted \t {0:0.00}ms (stddev: {1:0.00}ms)", results[1].Average(), StandardDeviation(results[1]));
    Console.WriteLine("ParseTrustedUnsafe \t\t {0:0.00}ms (stddev: {1:0.00}ms)", results[2].Average(), StandardDeviation(results[2]));
}

private static double StandardDeviation(IEnumerable<double> doubleList)
{
    double average = doubleList.Average();
    double sumOfDerivation = 0;
    foreach (double value in doubleList)
    {
     sumOfDerivation += (value) * (value);
    }
    double sumOfDerivationAverage = sumOfDerivation / doubleList.Count();
    return Math.Sqrt(sumOfDerivationAverage - (average * average));
}

Davy Landman 2009-01-07 19:19:55

Well, thanks for sharing! I had thought about creating an unsafe implementation, but on my machine I didn't observe a noticeable difference, but I didn't try loop unrolling. Maybe I should learn to do this. I never thought about using byte pointers instead of char. Would it really make a difference?

Hosam Aly 2009-01-07 19:52:05

You may enhance performance further by removing the subtraction of 0x30 from the loop, and performing it once at the end (as per Jon Skeet's suggestion; see my updated implementation).

Hosam Aly 2009-01-07 19:52:52

Hosam Aly 2009-01-07 19:53:56

adding the reduction later does not save that much ParseTrustedUnsafe 442,13ms(stddev:3,50ms),ParseTrustedUnsafe2 428,70ms(stddev: 3,61ms)...In my opinion, trusted really trusts the caller.. so allows the caller to shoot himself in the foot. And for other questions, make it a challenge for yourself?

Davy Landman 2009-01-08 08:56:20

ansaurus

tags:

views:

answers:

Could you please review my Quick Int Parser implementation?

related questions