views:

69

answers:

3

Hello there, I have a string which looks like this "a 3e,6s,1d,3g,22r,7c 3g,5r,9c 19.3", how do I go through it and extract the integers and assign them to its corresponding letter variable?. (i have integer variables d,r,e,g,s and c). The first letter in the string represents a function, "3e,6s,1d,3g,22r,7c" and "3g,5r,9c" are two separate containers . And the last decimal value represents a number which needs to be broken down into those variable numbers.

my problem is extracting those integers with the letters after it and assigning them into there corresponding letter. and any number with a negative sign or a space in between the number and the letter is invalid. How on earth do i do this?

+2  A: 

A simple state machine seems like the way to go about this. I'm not sure the rules you have given are complete enough, In particular I don't understand the function of spaces, or what you mean by "separate containers". You should add more code to test for invalid states, but this should get you started.

// the string we want to parse.
char * psz = "a 3e,6s,1d,3g,22r,7c 3g,5r,9c 19.3";

// this is the states that our parser can be in.
enum {
   state_init,
   state_number,
   state_letter,
   state_comma,
   state_space,
   state_decimal,
   };

// storage for our letter values
int letter_vals['z' - 'a' + 1] = 0;

int val = 0; 
int state = state_init;
while (psz[0])
{
   char ch = psz[0];
   if (ch >= '0' && ch <= '9')
   {
      if (state == state_decimal)
      {
         // this is the last value that needs special treatment.
         double dval = (double)val + (ch / 10.0);
      }
      else if (state == state_number)
      {
         val = (val * 10) + ch - '0';
      }
      else
      {
         // we expect state to be state_space or state_comma here
         val = ch;
      }

      state = state_num;
   }
   else if (ch >= 'a' && ch <= 'z')
   {
      if (state == state_num)
      {
         letter_vals[ch - 'a'] = val;
         val = 0;
      }
      else if (state == state_init)
      {
         // ch is our "function"
      }
      else
      {
         // this is a letter that isn't after a number 
      }
      state = state_letter;
   }
   else if (ch == ',')
   {
      // state should be state_letter here
      state = state_comma;
   }
   else if (ch == ' ')
   {
      if (state == state_number)
      {
         // a space in the middle of the number or after a number is invalid!
      }
      else if (state == state_letter)
      {
         // this is a space after a letter, this means what?
      }
      else if (state == state_space)
      {
         // are multiple spaces invalid?
      }
      state = state_space;
   }
   else if (ch == '.')
   {
      if (state == state_number)
      {
         // this is normal 
      } 
      else
      {
         // this is an invalid state, a decimal not inside a number.
      }
      state = state_decimal;
   }
   else if (ch == '-')
   {
      // this is an invalid character
   }
   else
   {
      // this is an invalid letter.
   }


   ++psz;
}
John Knoeller
your state machine looks a bit cumbersome actually. A switch on 'ch' will help.
NomeN
@NomeN: switches in C don't work so well with ranged values like 1=9 or a-z, but otherwise, I would agree, state machines and switch statements usually go well together.
John Knoeller
+2  A: 

How about using a regular expression for parsing the different parts into variables. After that you could convert the parsed variables to your target types.

A regex that uses grouping could look something like that ugly monster:

^([a-zA-Z]) (-?\d{1,2}) ?e,(-?\d{1,2}) ?s,(-?\d{1,2}) ?d,(-?\d{1,2}) ?g,(-?\d{1,2}) ?r,(-?\d{1,2}) ?c (-?\d{1,2}) ?g,(-?\d{1,2}) ?r,(-?\d{1,2}) ?c ([0-9.]{1,4})

Perhaps not yet perfect, but it is a start.

Here is a code sample to get you started:

#include <regex>

using std::string;
using std::tr1::cmatch;
using std::tr1::regex;

const regex pattern("\\.([^\\.]+)$");
cmatch result;

string dateiname("test.abc");
string erweiterung;

if(regex_search(dateiname.c_str(), result, pattern) == true)
    erweiterung = result[1];
nabulke
+1  A: 

The description of the string format is not really clear but I think I can answer your question anyway (extracting the integers with the letters and adding(?) them to the proper int variable).

So beginning with this string:

char* was = "3e,6s,1d,3g,22r,7c"; // was == weird ass string

it is probably easiest to tokenize it using strtok.

char* token = strtok (was,",");
while (token != NULL) {
    assign(token); // first token is 3e, second 6s etc...
    token = strtok (NULL, ",");
}

Now you can use sscanf to find the number and letter.

void assign(char* token) {
    char letter;
    int number;
    if (0 != sscanf(token, "%d%c", number, letter)) {
        // the first token produces letter 'e' and number '3'
        // now you can switch on letter and add number 
        // to the proper variable in each case
    } else {
        //matching failure!!
    }
}

With regards to the other quirks with your string format (the separate containers and the float at the end (others??)), you can handle those in similar ways. Just think of it like pealing an onion, work your way through the format layer by layer until you get to the letter number combination.

Additionally any format faults will be caught at the very least when sscanf is invoked.

NomeN