views:

453

answers:

7

Hi,

I have a large file where each line contains space-separated integers. The task is to sparse this file line-by-line. For the string to int conversion I have three solutions:

static int stringToIntV1(const string& str) {
    return (atoi(str.c_str()));
}

However, if I pass a malformed string, it doesn't produce any error. For instance the string "123error" is converted to 123.

Second solution:

static int stringToIntV2(const string& str)
{
    int result;
    istringstream myStream(str);

    if (myStream >> result) {
        return result;
    }
    // else
    throw domain_error(str + " is not an int!");
}

I have the same problem here, malformed strings don't raise an error.

Third solution with Boost (found at http://stackoverflow.com/questions/149268/boost-library):

static int stringToIntV3(const string& str)
{
    int iResult = 0;
    try {
        iResult = lexical_cast<int>(str);
    }
    catch(bad_lexical_cast &) {
        throw domain_error(str + " is not an int!");
    }
    return iResult;
}

This one gives correct result.

However, there is a significant difference in the execution time. Testing on a large text file (32 MB), I got the following times:

  • (1) with atoi: 4.522s (winner)
  • (2) with istringstream: 15.303s (very slow)
  • (3) with lexical_cast: 10.958s (between the two)

My question: do you know how to notice malformed strings with atoi? It would give the fastest solution. Or do you know a better solution?

Update: Thanks for the answers. Following the tips, I came up with this solution:

static int stringToIntV4(const string& str)
{
    char * pEnd;
    const char * c_str = str.c_str();
    int result = strtol(c_str, &pEnd, 10);
    if (pEnd == c_str+str.length()) {
        return result;
    }
    // else
    throw domain_error("'" + str + "'" + " is not an int!");
}

The good news is that it yields if there is a problem and as efficient as the atoi version.

A: 

Can you simply check the string before calling atoi?

isdigit() is a wonderful little function, and you should have no problem writing isalldigits() to check a string in a tight, fast loop.

(if you happen to work with decimals, or +/- signs, you might want to add that in too)

So, simply verify that the string is all digits, then call atoi. Should be pretty fast.

abelenky
"simply verify that the string is all digits, then call atoi" You should see this is unnecessary work. Why not parse numbers as you go?
GMan
@GMan: That's what `strtol` does.
David Thornley
...right. That's why I'm saying his suggestion is wasteful.
GMan
If OP were up for parsing the numbers as he went, I think he already would've done so. Instead, his question is about using existing functions and libraries to accomplish the task.
abelenky
A: 

Any leading zeroes in the numbers? if atoi returns zero and the first digit character of the number is not '0', then we have an error.

Seva Alekseyev
Counter example: -0
anon
Since when is -0 a valid integer constant? It's an expression at best.
Seva Alekseyev
+1  A: 

I did my own small test. It seem to take about the same amount of time to use atoi() as with stream operators. I have a file with 2,000,000 numbers. 10 numbers on each line (though the code does not use this fact).

The first version uses atoi() which I admit took me a while to get correct and could be more effecient. Updates accepted and to get it more effecient.

The stream one. Took 20 seconds to write and worked out of the box.
Timing Results Are:

> time ./a.exe 1
AtoI()
6.084u 0.156s 0:06.33 98.4%     0+0k 0+0io 8619pf+0w
> time ./a.exe
Iterator()
4.680u 0.296s 0:05.05 98.4%     0+0k 0+0io 6304pf+0w

Code:

#include <vector>
#include <iostream>
#include <iterator>
#include <fstream>
#include <iostream>

int main(int argc,char* argv[])
{
    std::vector<int>    data;
    std::ifstream       vals("rand.txt");

    if (argc > 1)
    {
        std::cout << "AtoI()\n";

        std::string line;
        while(std::getline(vals,line))
        {
            std::string::size_type loop = 0;
            while(loop < line.size())
            {
                while(isspace(line[loop]) && (loop < line.size()))
                {   ++loop;
                }
                std::string::size_type end = line.find_first_not_of("0123456789",loop);
                data.push_back(atoi(line.substr(loop,end - loop).c_str()));

                loop = end;
            }

        }
    }
    else
    {
        std::cout << "Iterator()\n";
        std::copy(  std::istream_iterator<int>(vals),
                    std::istream_iterator<int>(),
                    std::back_inserter(data)
                 );
    }
}
Martin York
A: 

Why not convert it while you have a valid file pointer rather than loading it into a string then parsing the integer? Basically you're wasting memory and time.

In my opinion you should do this:

// somewhere in your code
ifstream fin("filename");
if(!fin) {
 // handle error
}

int number;
while(getNextNumber(fin, number))
{
 cout << number << endl;
}
fin.close();
// done

// some function somewhere
bool getNextNumber(ifstream& fin, int& number)
{
 while(!(fin >> number))
 {
  fin.clear();
  fin.ignore(numeric_limits<streamsize>::max(),'\n');
 }

 if(!fin || fin.eof())
  return false;
 return true;
}
Daniel
Actually my function needs some tweeking but you get the idea, I think. I think it might do an infinite loop.
Daniel
+12  A: 

I'd use strtol. It takes a parameter that it sets to point at the first character it couldn't convert, so you can use that to determine whether the entire string was converted.

Edit: as far as speed goes, I'd expect it to be slightly slower than atoi, but faster than the others you tried.

Jerry Coffin
Guesswork of course, but if I were implementing atoi() I'd probably do it in terms of strtol() - but then I'm lazy :-)
anon
@Neil:If I were starting from scratch, I'd probably do the same -- but I'd guess *most* libraries had `atoi` working for years before `strtol` was even dreamed of...
Jerry Coffin
glibc's implementation of `atoi` is in terms of `strtol`.
sixlettervariables
+2  A: 

The strtol function gives you back a pointer to the next character in the string. You could check that character after conversion to see if it is not whitespace, indicating error. I ran a simple test and the performance of strtol seems to be comparable to atoi.

iboisver
+2  A: 

I know this one has been covered with existing functions. If performance is a paramount concern, it is not much more work to write your own conversion to do exactly what you need. For example, if you know there will not be any leading spaces or all your numbers are positive, don't deal with those cases. The compiler should be able to inline this function too.

#include <ctype.h>
#include <string>

inline int customStringToInt( const std::string &str ) {
  register const char *p = str.c_str(), *pEnd = p + str.size();

  // Remove leading whitespace (remove if no leading whitespace).
  while( p != pEnd && isspace(*p) ) ++p;

  // Handle neg/pos sign (remove if no sign).
  int sign = 1;
  if( p != pEnd ) {
    if(      *p == '-' ) { ++p; sign = -1; }
    else if( *p == '+' ) { ++p; }
  } 

  // String without any digits is not OK (blank != 0) (remove if it is OK).
  if( p == pEnd ) throw domain_error("'" + str + "'" + " has no digits."); 

  // Handle digits.
  register int i = 0;
  while( p != pEnd )
    if( isdigit(*p) ) i = i * 10 + (*p++ - '0');
    else throw domain_error("'" + str + "'" + " contains non-digits.");

  return sign * i;
}