views:

451

answers:

6

I use the boost lexical_cast library for parsing text data into numeric values quite often. In several situations however, I only need to check if values are numeric; I don't actually need or use the conversion.

So, I was thinking about writing a simple function to test if a string is a double:

template<typename T> 
bool is_double(const T& s)
{
  try 
  {
    boost::lexical_cast<double>(s); 
    return true;
  }
  catch (...) 
  {
    return false;
  }
}

My question is, are there any optimizing compilers that would drop out the lexical_cast here since I never actually use the value?

Is there a better technique to use the lexical_cast library to perform input checking?

+2  A: 

Since the cast might throw an an exception, a compiler that would just drop that cast would be seriously broken. You can assume that all major compilers will handle this correctly.

Trying to to do the lexical_cast might not be optimal from a performance point of view, but unless you check millions of values this way it won't be anything to worry about.

sth
+1  A: 

I think you want to re-write that function slightly:

template<typename T>  
bool tryConvert(std::string const& s) 
{ 
    try         { boost::lexical_cast<T>(s);} 
    catch (...) { return false; }

    return true; 
} 
Martin York
`bool` to `double` conversions are all the rage.
GMan
Is this just a style change? Is there any other reason to take the return out of the try?
Inverse
@Inverse: Look at your original function name and return type. That said, moving the return out is a cosmetic change, as far as I can tell.
GMan
@GMan: *slaps forehead* you're right, I typed the wrong function name/return type!
Inverse
A: 

The compiler is pretty unlikely to manage to throw out the conversion no matter what. Exceptions are just the icing on the cake. If you want to optimize this, you'll have to write your own parser to recognize the format for a float. Use regexps or manually parse, since the pattern is simple:

if ( s.empty() ) return false;
string::const_iterator si = s.begin();
if ( *si == '+' || * si == '-' ) ++ si;
if ( si == s.end() ) return false;
while ( '0' <= *si && *si <= '9' && si != s.end() ) ++ si;
if ( si == s.end() ) return true;
if ( * si == '.' ) ++ si;
if ( ( * si == 'e' || * si == 'E' )
 && si - s.begin() <= 1 + (s[0] == '+') + (s[0] == '-') ) return false;
if ( si == s.end() ) return si - s.begin() > 1 + (s[0] == '+') + (s[0] == '-');
while ( '0' <= *si && *si <= '9' && si != s.end() ) ++ si;
if ( si == s.end() ) return true;
if ( * si == 'e' || * si == 'E' ) {
    ++ si;
    if ( si == s.end() ) return false;
    if ( * si == '-' || * si == '+' ) ++ si;
    if ( si == s.end() ) return false;
    while ( '0' <= *si && *si <= '9' && si != s.end() ) ++ si;
}
return si == s.end();

Not tested… I'll let you run through all the possible format combinations ;v)

Edit: Also, note that this is totally incompatible with localization. You have absolutely no hope of internationally checking without converting.

Edit 2: Oops, I thought someone else already suggested this. boost::lexical_cast is actually deceptively simple. To at least avoid throwing+catching the exception, you can reimplement it somewhat:

istringstream ss( s );
double d;
ss >> d >> ws; // ws discards whitespace
return ss && ss.eof(); // omit ws and eof if you know no trailing spaces

This code, on the other hand, has been tested ;v)

Potatoswatter
Depends on "wrong". This will fail `123 `.
GMan
@GMan: Unclear whether that should succeed. If he extracts a string at a time from an input string, it won't happen anyway. Is that the reason for a downvote? edit - I was aware of that anyway, hoping for a more constructive correction :vP, can't remember how to discard whitespace besides a `sentry`
Potatoswatter
I didn't downvote you, by the way. Discard by: `ss >> std::ws`. See http://stackoverflow.com/questions/1243428/convert-string-to-int-with-bool-fail-in-c/1243435#1243435 on how I mimic `lexical_cast`.
GMan
@GMan: thx, good man lol
Potatoswatter
A: 

You could try something like this.

#include <sstream>

//Try to convert arg to result in a similar way to boost::lexical_cast
//but return true/false rather than throwing an exception.
template<typename T1, typename T2>
bool convert( const T1 & arg, T2 & result )
{
    std::stringstream interpreter;
    return interpreter<<arg && 
           interpreter>>result && 
           interpreter.get() == std::stringstream::traits_type::eof();
}

template<typename T>
double to_double( const T & t )
{
   double retval=0;
   if( ! convert(t,retval) ) { /* Do something about failure */ }
   return retval;
}

template<typename T>
double is_double( const T & t )
{
   double retval=0;
   return convert(t,retval) );
} 

The convert function does basically the same things as boost::lexical_cast, except lexical cast is more careful about avoiding allocating dynamic storage by using fixed buffers etc.

It would be possible to refactor the boost::lexical_cast code into this form, but that code is pretty dense and tough going - IMHO its a pity that lexical_cast wasn't implemented using somethign like this under the hood... then it could look like this:

template<typename T1, typename T2>
T1 lexical_cast( const T2 & t )
{
  T1 retval;
  if( ! try_cast<T1,T2>(t,retval) ) throw bad_lexical_cast();
  return retval;
}
Michael Anderson
Note that your streams code doesn't check if the number is followed by someting else separated by whitespace. `123.4 blah` converts fine.
Potatoswatter
Indeed that was the case. boost hides prevention of partial matches inside its custom stream implementation. I've integrated their method, interpreter.get()== std::stringstream::traits_type::eof(), into my code so that partial matches will no longer occur.
Michael Anderson
A: 

Better use regexes first and lexical_cast just to convert to the real type.

piotr
And how do you maintain regular expressions for all valid conversions and make sure that they *exactly* match the format used by `lexical_cast`, down to culture-dependent formats? Good luck with that.
Konrad Rudolph
A: 

As the type T is a templated typename, I believe your answer is the right one, as it will be able to handle all cases already handled by boost::lexical_cast.

Still, don't forget to specialize the function for known types, like char *, wchar_t *, std::string, wstring, etc.

For example, you could add the following code :

template<>
bool is_double<int>(const int & s)
{
   return true ;
}

template<>
bool is_double<double>(const double & s)
{
   return true ;
}

template<>
bool is_double<std::string>(const std::string & s)
{
   char * p ;
   strtod(s.c_str(), &p) ; // include <cstdlib> for strtod
   return (*p == 0) ;
}

This way, you can "optimize" the processing for the types you know, and delegate the remaining cases to boost::lexical_cast.

paercebal