views:

821

answers:

4

I have a string(char*), and i need to find its underlying datatype such as int, float, double, short, long, or just a character array containing alphabets with or with out digits(like varchar in SQL). For ex:

    char* str1 = "12312"
    char* str2 = "231.342"
    char* str3 = "234234243234"
    char* str4 = "4323434.2432342"
    char* str5 = "i contain only alphabets"

Given these strings, i need to find that the first string is of type int and typecast it to an int, and so on ex:

int no1 = atoi(str1)
float no2 = atof(str2)
long no3 = atol(str3)
double no4 = strtod(str4)
char* varchar1 = strdup(str5)


Clarifying a bit more...

I have a string and its contents could be alphabets and/or digits and/or special characters. Right now, I am able to parse string and

  1. Identify if it contains only digits,
    Here i convert the string into short or int or long, based on best fit. ( How do i know if the string can be converted to an short int or long?)
  2. Only alphabets, leave it as a string.
  3. Digits with a single decimal point.
    Here i need to convert the string into float or double ( Same question here)
  4. other. leave it as a string
+1  A: 

First, check whether the problem hasn't already been solved for you. It could be that your library functions for converting strings to numbers already do the checks that you need.

Failing that, you're going to need to do some pattern matching on strings, and that's what regular expressions are for!

E.g. if the string matches the regexp:

[+-]?\d+

then you know that it's an int or a long. Convert it to a long, and then check its size. If your long can fit into an int, convert it to an int.

You can do the same for floats and doubles, although the regular expression is a bit mroe complicated.

Watch out for awkward cases like the empty string, a lone decimal point, numbers too large for a long, and so on. You also need to decide whether you will allow exponent notation.

dysfunctor
+1  A: 

Try getting it into a long with sscanf. If that fails, try getting it into a double with sscanf. If that fails, it's a string. You can use the %n conversion to tell whether all of the input was consumed successfully. The constants in <limits.h> and <float.h> may help you decide if the numeric results can fit into narrower types on your platform. If this isn't homework, your destination types are probably externally defined - e.g. by a database schema - and the latter comment is irrelevant.

fizzer
sscanf() using "%ld" will happily convert the zero in "0.1234E-39" into an integer, so it doesn't help all that much. It also won't convert 0xABCD without assistance in the form of "0x%ld" in the scan format string. Etc.
Jonathan Leffler
1st point - that's why I said to use %n. 2nd point - %x will happily convert 0xABCD. The OP's examples didn't include hex input; if he needs it, he will obviously require an additional call (as he would with strtoul and friends). I'm not going to post a complete solution to a homework question.
fizzer
A: 

First of all, you should decide which representatins you want to recognize. For example, is 0xBAC0 an unsigned short expressed in hex? Same goes for 010 (in octal) and 1E-2 (for 0,01).

Once you have decided on the represantation, you can use regular expressions to determine the general forms. For example:

  • -?\d*.\d*([eE]?[+-]?\d*.\d*)? is a floating point number (almost, it accept weird things like .e-. you should define the regex that is most appropriate for you)
  • -?\d+ is an integer
  • 0x[0-9A-Fa-f]+ is an hex constant

and so on. If you are not using a regex library you will have to write a small parser for those represantion from scratch.

Now you can convert it to the largest possible type (e.g. long long for integers, double for floating pointers) and then use the values in limits.h to see if the value would fit in a smaller type.

For example if the integer is less than SHRT_MAX you can assume it's a short.

You might also have to take arbitrary decisions, for example 54321 can only be an unsigned short but 12345 could be a signed short or an unsigned short.

Remo.D
1E-2 is normally 0.01, isn't it?
Jonathan Leffler
+1  A: 

In C (not in C++), I would use a combination of strtod/strol and max values from <limits.h> and <float.h>:

#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
#include <float.h>

/*    Now, we know the following values:
      INT_MAX, INT_MIN, SHRT_MAX, SHRT_MIN, CHAR_MAX, CHAR_MIN, etc.    */

typedef union tagMyUnion
{
   char TChar_ ; short TShort_ ; long TLong_ ; double TDouble_ ;
} MyUnion ;

typedef enum tagMyEnum
{
   TChar, TShort, TLong, TDouble, TNaN
} MyEnum ;

void whatIsTheValue(const char * string_, MyEnum * enum_, MyUnion * union_)
{
   char * endptr ;
   long lValue ;
   double dValue ;

   *enum_ = TNaN ;

   /* integer value */
   lValue = strtol(string_, &endptr, 10) ;

   if(*endptr == 0) /* It is an integer value ! */
   {
      if((lValue >= CHAR_MIN) && (lValue <= CHAR_MAX)) /* is it a char ? */
      {
         *enum_ = TChar ;
         union_->TChar_ = (char) lValue ;
      }
      else if((lValue >= SHRT_MIN) && (lValue <= SHRT_MAX)) /* is it a short ? */
      {
         *enum_ = TShort ;
         union_->TShort_ = (short) lValue ;
      }
      else if((lValue >= LONG_MIN) && (lValue <= LONG_MAX)) /* is it a long ? */
      {
         *enum_ = TLong ;
         union_->TLong_ = (long) lValue ;
      }

      return ;
   }

   /* real value */
   dValue = strtod(string_, &endptr) ;

   if(*endptr == 0) /* It is an real value ! */
   {
      if((dValue >= -DBL_MAX) && (dValue <= DBL_MAX)) /* is it a double ? */
      {
         *enum_ = TDouble ;
         union_->TDouble_ = (double) dValue ;
      }

      return ;
   }

   return ;
}

void studyValue(const char * string_)
{
   MyEnum enum_ ;
   MyUnion union_ ;

   whatIsTheValue(string_, &enum_, &union_) ;

   switch(enum_)
   {
      case TChar    : printf("It is a char : %li\n", (long) union_.TChar_) ; break ;
      case TShort   : printf("It is a short : %li\n", (long) union_.TShort_) ; break ;
      case TLong    : printf("It is a long : %li\n", (long) union_.TLong_) ; break ;
      case TDouble  : printf("It is a double : %f\n", (double) union_.TDouble_) ; break ;
      case TNaN     : printf("It is a not a number : %s\n", string_) ; break ;
      default       : printf("I really don't know : %s\n", string_) ; break ;
   }
}

int main(int argc, char **argv)
{
   studyValue("25") ;
   studyValue("-25") ;
   studyValue("30000") ;
   studyValue("-30000") ;
   studyValue("300000") ;
   studyValue("-300000") ;
   studyValue("25.5") ;
   studyValue("-25.5") ;
   studyValue("25555555.55555555") ;
   studyValue("-25555555.55555555") ;
   studyValue("Hello World") ;
   studyValue("555-55-55") ;

   return 0;
}

Which results in the following:

[25] is a char : 25
[-25] is a char : -25
[30000] is a short : 30000
[-30000] is a short : -30000
[300000] is a long : 300000
[-300000] is a long : -300000
[25.5] is a double : 25.500000
[-25.5] is a double : -25.500000
[25555555.55555555] is a double : 25555555.555556
[-25555555.55555555] is a double : -25555555.555556
[Hello World] is a not a number
[555-55-55] is a not a number

Sorry for my rusty C.

:-)

So, in substance, you after the call of whatIsTheValue, you retrieve the type through the MyEnum enum, and then, according to the value in this enum, retrieve the right value, correctly typed, from the union MyUnion.

Note that finding if the number is a double or a float is a bit more complicated because the difference seems to be in the precision, i.e. is your number representable in a double, or in float. A most "decimal real" numbers are not exactly representable into a double, I would not bother.

Note, too, that there is a catch, as 25.0 could be both real and an integer number. My comparing "dValue == (double)(long)dValue", I guess you should know if is an integer, again, not taking into account the usual precision problems coming witb binary real numbers used by computers.

paercebal
fizzer
The example is not supposed to be complete: It's just offering a correct pattern combining an enum, an union, and a type-finder function. The user should adapt these symbols to its needs, like the supported types, how he will handle the double that is an integer, hexadecimal notation, etc..
paercebal