views:

688

answers:

8

I'm trying to write a function that is able to determine whether a string contains a real or an integer value.

This is the simplest solution I could think of:

int containsStringAnInt(char* strg){
  for (int i =0; i < strlen(strg); i++) {if (strg[i]=='.') return 0;}
  return 1;
}

But this solution is really slow when the string is long... Any optimization suggestions? Any help would really be appreciated!

+3  A: 

Is your string hundreds of characters long? Otherwise, don't care about any possible performance issues. The only inefficiency is that you are using strlen() in a bad way, which means a lot of iterations over the string (inside strlen). For a simpler solution, with the same time complexity (O(n)), but probably slightly faster, use strchr().

erikkallen
Yes, you should use strchr() for searching - library functions are often written in optimized assembly code so they outgun simple C-compiled versions. +1.
paxdiablo
As was the point with my answer, it will not matter so you took a good decision (upvoting) for the wrong reason.
erikkallen
+7  A: 

What's the syntax of your real numbers?

1e-6 is valid C++ for a literal, but will be passed as integer by your test.

Paul
+2  A: 

Your function does not take into account exponential notation of reals (1E7, 1E-7 are both doubles)

Use strtol() to try to convert the string to integer first; it will also return the first position in the string where the parsing failed (this will be '.' if the number is real). If the parsing stopped at '.', use strtod() to try to convert to double. Again, the function will return the position in the string where the parsing stopped.

Don't worry about performance, until you have profiled the program. Otherwise, for fastest possible code, construct a regular expression that describes acceptable syntax of numbers, and hand-convert it first into a FSM, then into highly optimized code.

zvrba
"(this will be '.' if the number is real)" Not if it's "1e-7" (to use your example)
Paul
also note german locales where numbers are "1,0". i had problems with that previously, until i figured out the "," in the number is causing them :)
Johannes Schaub - litb
+2  A: 

You are using strlen, which means you are not worried about unicode. In that case why to use strlen or strchr, just check for '\0' (Null char)

int containsStringAnInt(char* strg){ 

  for (int i =0;strg[i]!='\0'; i++) {
      if (strg[i]=='.') return 0;}   
  return 1; }

Only one parsing through the string, than parsing through the string in each iteration of the loop.

Adarsha
This answer is effectively the same as my answer (below), and should have nearly identical performance.
abelenky
No, this might be a partial parsing, while your code parses teh entire string atleast once for strlen, which is essentially searches for '\0'
Adarsha
return (NULL != strchr(strg,'.'));
Andrew Rollings
strchr() is a better option since library functions are frequently optimized assembly whereas this code would not compile that way.
paxdiablo
A: 

Strlen walks the string to find the length of the string.

You are calling strlen with every pass of the loop. Hence, you are walking the string way many more times than necessary. This tiny change should give you a huge performance improvement:

int containsStringAnInt(char* strg){
  int len = strlen(strg);
  for (int i =0; i < len; i++) {if (strg[i]=='.') return 0;}
  return 1;
}

Note that all I did was find the length of the string once, at the start of the function, and refer to that value repeatedly in the loop.

Please let us know what kind of performance improvement this gets you.

abelenky
A: 

@Aaron, with your way also you are traversing the string twice. Once within strlen, and once again in for loop. Best way for ASCII string traversing in for loop is to check for Null char in the loop it self. Have a look at my answer, that parses the string only once within for loop, and may be partial parsing if it finds a '.' prior to end. that way if a string is like 0.01xxx (anotther 100 chars), you need not to go till end to find the length.

Adarsha
True, true.Ultimately, your code walks a fraction of the string (to a max of the whole string).My code walks the whole string once + a fraction (to a max of the whole string).In big O-notation, both codes run in O(length) time, but yours has a smaller constant factor.
abelenky
+1  A: 
Johannes Schaub - litb
A: 
#include <stdlib.h>
int containsStringAnInt(char* strg){ 
    if (atof(strg) == atoi(strg))
        return 1;
    return 0;
}
Igor Oks