views:

472

answers:

5

I'll be the first to admit that my overall knowledge of low level programming is a bit sparse. I understand many of the core concepts but I do not use them on a regular basis. That being said I was absolutely astounded at how much code was needed for dtoa.c.

For the past couple months I have been working on an ECMAScript implementation in C# and I've been slowing filling in the holes in my engine. Last night I started working on Number.prototype.toString which is described in section 15.7.4.2 of the ECMAScript specification. In section 9.8.1, NOTE 3 offers a link to dtoa.c but I was looking for a challenge so I waited to view it. The following is what I came up with.

private IDynamic ToString(Engine engine, Args args)
{
    var thisBinding = engine.Context.ThisBinding;
    if (!(thisBinding is NumberObject) && !(thisBinding is NumberPrimitive))
    {
        throw RuntimeError.TypeError("The current 'this' must be a number or a number object.");
    }

    var num = thisBinding.ToNumberPrimitive();

    if (double.IsNaN(num))
    {
        return new StringPrimitive("NaN");
    }
    else if (double.IsPositiveInfinity(num))
    {
        return new StringPrimitive("Infinity");
    }
    else if (double.IsNegativeInfinity(num))
    {
        return new StringPrimitive("-Infinity");
    }

    var radix = !args[0].IsUndefined ? args[0].ToNumberPrimitive().Value : 10D;

    if (radix < 2D || radix > 36D)
    {
        throw RuntimeError.RangeError("The parameter [radix] must be between 2 and 36.");
    }
    else if (radix == 10D)
    {
        return num.ToStringPrimitive();
    }

    var sb = new StringBuilder();
    var isNegative = false;

    if (num < 0D)
    {
        isNegative = true;
        num = -num;
    }

    var integralPart = Math.Truncate(num);
    var decimalPart = (double)((decimal)num.Value - (decimal)integralPart);
    var radixChars = RadixMap.GetArray((int)radix);

    if (integralPart == 0D)
    {
        sb.Append('0');
    }
    else
    {
        var integralTemp = integralPart;
        while (integralTemp > 0)
        {
            sb.Append(radixChars[(int)(integralTemp % radix)]);
            integralTemp = Math.Truncate(integralTemp / radix);
        }
    }

    var count = sb.Length - 1;
    for (int i = 0; i < count; i++)
    {
        var k = count - i;
        var swap = sb[i];
        sb[i] = sb[k];
        sb[k] = swap;
    }

    if (isNegative)
    {
        sb.Insert(0, '-');
    }

    if (decimalPart == 0D)
    {
        return new StringPrimitive(sb.ToString());
    }

    var runningValue = 0D;
    var decimalIndex = 1D;
    var decimalTemp = decimalPart;

    sb.Append('.');
    while (decimalIndex < 100 && decimalPart - runningValue > 1.0e-50)
    {
        var result = decimalTemp * radix;
        var integralResult = Math.Truncate(result);
        runningValue += integralResult / Math.Pow(radix, decimalIndex++);
        decimalTemp = result - integralResult;
        sb.Append(radixChars[(int)integralResult]);
    }

    return new StringPrimitive(sb.ToString());
}

Can anyone with more experience in low level programming explain why dtoa.c has roughly 40 times as much code? I just cannot imagine C# being that much more productive.

+3  A: 

Based on a quick glance at it, a fair amount of the C version is dealing with multiple platforms and such as it looks like this file is meant to be generically usable across compilers (C & C++), bitnesses, floating point implementations, and platforms; with tons of #define configurability.

dkackman
I Agree. I also had a short look at the code, I felt sick of the work it represents for such a simple operation. With .NET I see these times fortunately behind us for application programming.
jdehaan
Wouldn't it just be easier to have an archive of c files with each supporting a separate platform? Reading all those `#ifdef` blocks gave me a brain aneurysm...
ChaosPandion
Yeah, the problem isn't the languages, it's just an awkward design overall. The same effect could be achieved with better modularity.
Cogwheel - Matthew Orlando
@ChaosPandion: depends whether you want your platform-specific configuration to be done in a C header file (setting the defines), or in a makefile (selecting the right .c file to build). Also, if there are enough defines to give you an aneurysm, that could be 2^aneurysm distinct c files if all the possibilities were multiplied out. In practice, when writing this kind of ultra-portable stuff you usually want a small number of source files (in this case 1, but sometimes a few more) driven by a large number of configuration options, rather than the other way around.
Steve Jessop
... and the reason you might want to write the code to support platforms based on their behaviour / properties, rather than writing exactly one dtoa implementation for each platform, is so that you can port to new platforms just by writing a header file with the right combination of defines, rather than re-writing all of stdlib every time. A massively-configurable implementation like this isn't always the best, but it may not be as bad as it looks when you consider the alternative.
Steve Jessop
@Steve - You know I think it just may have been the initial shock of seeing so much code. Looking at it in more detail it actual comes together a bit better. I also found @Tyler's comment in @Lajnold's answer to be a bit enlightening about the subject.
ChaosPandion
@dkackman - What do you think contributes more to the amount of code, cross-platform compatibility or the extreme accuracy that dtoa provides?
ChaosPandion
@ChaosPandion hard for me to judge without digging into it (rough guess about 50/50). One way to separate the wheat form the chaff would be to look at the preprocessed output for a compilation that had the #defines set the way you want to support. Using MS tools you can do this with something close to: CL /E /C dtoa.c > dtoa.pre.c. This will show you what the preprocessor is actually submitting to the compiler.
dkackman
+4  A: 

I think also that the code in dtoa.c might be more efficient (independent of language). For example, it seems to be doing some bit-fiddling, which in the hands of an expert often means speed. I assume it simply uses a less intuitive algorithm for speed reasons.

Lajnold
A lot of C library functions (and C library extensions) are written like this. It makes sense, because they are generally not updated, edited, or otherwise maintained very frequently, but they are called all over the place from vast numbers of programs. It only makes sense to make them as fast as possible even at the expense of some maintainability.
Tyler McHenry
+4  A: 

Producing good results for conversions between decimal and binary floating point representations is a rather difficult problem.

The major source of difficulty is that many decimal fractions, even simple ones, cannot be accurately expressed using binary floating point -- for example, 0.5 can (obviously), but 0.1 cannot -- and, conversely, many of the values expressible by a binary floating point representation cannot be written as a terminating decimal expansion.

So, conversion (in either direction) often involves approximation. Good conversion routines guarantee to produce the closest possible approximation within particular (word size of number of digits) constraints. This is where most of the complexity comes from.

Take a look at the paper cited in comment at the top of the dtoa.c implementation, Clinger's How to Read Floating Point Numbers Accurately, for a flavour of the problem; and perhaps David M. Gay (the author)'s paper, Correctly Rounded Binary-Decimal and Decimal-Binary Conversions.

(Also, more generally: What Every Computer Scientist Should Know About Floating Point Arithmetic.)

Matthew Slattery
I am well aware of the challenges that floating-point represents. My question has more to do with the difference in the amount of code. When I was testing my example against Firefox my code produced identical results. *Firefox does use dtoa.c*.
ChaosPandion
+9  A: 

dtoa.c contains two main functions: dtoa(), which converts a double to string, and strtod(), which converts a string to a double. It also contains a lot of support functions, most of which are for its own implementation of arbitrary-precision arithmetic. dtoa.c's claim to fame is getting these conversions right, and that can only be done, in general, with arbitrary-precision arithmetic. It also has code to round conversions correctly in four different rounding modes.

Your code only tries to implement the equivalent of dtoa(), and since it uses floating-point to do its conversions, will not always get them right.

(I've written a lot about this on my blog, http://www.exploringbinary.com/ . Six of my last seven articles have been about strtod() conversions alone. Read through them to see how complicated it is to do correctly rounded conversions.)

Rick Regan
Very interesting, I have some reading to do and I will definitely need to explore this code in greater detail.
ChaosPandion
It really isn't hard to believe that I would miss the "both-ways" conversion seen in this file. The code is really hard for me to read.
ChaosPandion
I think it's hard for most people to read!
Rick Regan
How do you explain the fact that the number `1024.1` when converted to base-2 produces the same output in Firefox as it does in my implementation? Could it be very specific and incorrect results will only be found through extensive tests?
ChaosPandion
Not every conversion will be wrong. Try a few more, you'll find one Firefox gets right and yours gets wrong. (BTW: how are you verifying that they are the same? Are you printing all the digits of the double?)
Rick Regan
The cases it gets wrong will be fairly rare, getting rarer the better your implementation is. If your implementation is really good you can prove that it never gets any of them wrong. If your implementation is crap, it might barely work at all. For instance if you're just testing 1024.1, I can pass that test with `return "1024.1";`. :-)
R..
+1. Not to mention that Gay's `dtoa` (which is the smaller of the two pieces of functionality in dtoa.c) also allows the user to specify how many digits of output, and what rounding method to use. Put bluntly, the C# code in the question does a tiny fraction of what dtoa.c does, and does it inaccurately, slowly, and buggily (try it with negative numbers, for example).
Mark Dickinson
I just wanted to clarify something: you said "1024.1 when converted to base-2". I assumed you meant converted to binary floating point, and then printed in decimal (after all, printing in decimal is what makes the problem hard, and why dtoa.c exists). But on second reading of your comment, I thought maybe you were saying you were setting your radix to 2, and printing in binary. That will work (assuming there's no error in your code -- I haven't run it), as it will for any power of 2 base -- it's essentially bit shifting. (But then I'd have to ask -- how did you get Firefox to print in binary?)
Rick Regan
@Mark - Woah, take it easy. I make no claim that this is production quality code. This is a simple side project I am doing for fun. I am just a curious coder trying to understand how all this stuff works. I don't care that mine is slower. I don't think my code is as buggy as you are claiming as negative numbers work just fine. As for testing I simply called `(1024.1).toString(2);` or `(144.15).toString(36);`. Obviously nothing formal but the output matches Firefox exactly.
ChaosPandion
@Rick - What do you think contributes more to the amount of code, cross-platform compatibility or the extreme accuracy that dtoa provides?
ChaosPandion
Did you try just plain old toString() (base 10)? It's interesting -- to my knowledge, Javascript uses dtoa.c, but dtoa.c only prints decimal strings. I wonder what is used to print to non-decimal bases?As for the amount of code, I'd say it's the "extreme accuracy" requirement (but Mark is more qualified to answer, having worked with dtoa.c extensively.)
Rick Regan
@Rick - I wonder if that is why he is tearing my code a new one. :)
ChaosPandion
@Rick - It looks like I didn't need to perform that extensive test to find it. @sth found that `(1.3333333333333333).toString(3)` is an example where my code fails.
ChaosPandion
Two things: 1) dtoa.c only handles base 10, so we're no longer comparing your code to dtoa.c (unless someone in Javascript has modified it to do so). 2) 1.3333333333333333 decimal is conceptually closest to 1.1 base 3, but don't forget it is going through binary floating-point first and then to base 3; I think it's allowed not to be 1.1 in this case, as long as what it prints represents the value in the floating-point variable faithfully.
Rick Regan
@Rick - You know what? My output matches the output of IE. I can't help but wonder what exactly these two browsers are doing. It is a bit comforting that my output matches a widely used implementation though.
ChaosPandion
Could you give some examples of matching output, for conversions to decimal strings? I'd like to understand better exactly what you're comparing. (Also, are you running random tests, or just hand picked examples? I don't think you can say the output is the same unless you run millions of examples).
Rick Regan
@Rick - I am not foolish enough to think that my code works 100% correctly without a formal test.
ChaosPandion
You know despite people busting on my code I feel like I learned 100 times more than I would have had I just plugged in a native DLL and used *dtoa.c*. Anyway I appreciate your responses and thanks for the blog link.
ChaosPandion
It's rather unexpected that your simple program's double to decimal conversions match IE and Firefox exactly. If that's true, then there'd be no need for dtoa.c. I'm just suggesting that maybe there's an error in your test methodology. Could you at least give me one example of a double to decimal conversion that matches, just so I can see what you are comparing? (And you're welcome for the blog link!)
Rick Regan
@ChaosPandion: (1) Apologies for the tone of the earlier comment. (2) It's the 'correctly rounded' requirement that's responsible for most of the size of `dtoa.c`: it essentially contains its own little arbitrary-precision integer library just to be sure that the results of the computations are correctly rounded. And getting all the corner cases exactly right (not to mention doing thorough testing) is really surprisingly difficult.
Mark Dickinson
@Chaos: The basis of your question is basically; "My code does everything that dtoa does, so why is dtoa so large?" That is incorrect, your code does not at all do what dtoa does, so don't be surprised when people call you on it.
Ed Swangren
@Ed Swangren - Way to sound like a pompous jerk, I asked the question to be educated not to pump up my ego.
ChaosPandion
A: 

Short answer: because dtoa.c works.

This is exactly the difference between well-debugged product and a NIH prototype.

Pavel Radzivilovsky
What does *NIH* mean?
ChaosPandion
"Not invented here" as in "custom ad-hok code, as an alternative to popular 3rd party code"
Pavel Radzivilovsky
I mean you are right but it explains nothing as to why there is so much code. Who knows maybe someone else wrote an implementation that correctly converted with half as much code and twice as much functionality.
ChaosPandion