tags:

views:

3505

answers:

5

Hi,

I have the following file/line:

pc=1 ct=1 av=112 cv=1100 cp=1700 rec=2 p=10001 g=0 a=0 sz=5 cr=200
pc=1 ct=1 av=113 cv=1110 cp=1800 rec=2 p=10001 g=0 a=10 sz=5 cr=200

and so on. I wish to parse this and take the key value pairs and put them in a structure:

struct pky
{
    pky() :
      a_id(0),
      sz_id(0),
      cr_id(0),
      cp_id(0),
      cv_id(0),
      ct_id(0),
      fr(0),
      g('U'),
      a(0),
      pc(0),
      p_id(0)
    { }
};

wherein either all the structure fields are used or some might be omitted.

How do I create a C++ class, which will do the same? I am new to C++ and not aware of any functions or library which would do this work.

Each line is to be processed, and the structure will be populated with one line each time and used, before it is flushed. The structure is later used as a parameter to a function.

+1  A: 

Unfortunately, your source data file is human-oriented, which means that you're going to have to do a bunch of string parsing in order to get it into the structure. Otherwise, if the data had been written directly as a binary file, you could just use fread() to pop it directly into the struct.

If you want to use an "elegant" (ie, ugly minimalistic approach), you could make a loop of sorts to parse each line, basically using strchr() to first find the '=' character, then the next space, then using atoi() to convert each number into a real int, and then using some pointer hackery to push them all into the structure. The obvious disadvantage there is that if the structure changes, or is even reorganized somehow, then the whole algorithm here would silently break.

So, for something that would be more maintainable and readable (but result in more code), you could just push each value into a vector, and then go through the vector and copy each value into the appropriate strucutre field.

Nik Reiman
can you give some code for doing the above? i am unable to figure out how i can push the same to a vector and then to the structure to be used later. also, i might need to convert structure to a hash later...
gagneet
+6  A: 

You can do something like this:

std::string line;
std::map<std::string, std::string> props;
std::ifstream file("foo.txt");
while(std::getline(file, line)) {
    std::string token;
    std::istringstream tokens(line);
    while(tokens >> token) {
        std::size_t pos = token.find('=');
        if(pos != std::string::npos) {
            props[token.substr(0, pos)] = token.substr(pos + 1);
        }
    }

    /* work with those keys/values by doing properties["name"] */
    Line l(props["pc"], props["ct"], ...);

    /* clear the map for the next line */
    props.clear();
}

i hope it's helpful. Line can be like this:

struct Line { 
    std::string pc, ct; 
    Line(std::string const& pc, std::string const& ct):pc(pc), ct(ct) {

    }
};

now that works only if the delimiter is a space. you can make it work with other delimiters too. change

while(tokens >> token) {

into for example the following, if you want to have a semicolon:

while(std::getline(tokens, token, ';')) {

actually, it looks like you have only integers as values, and whitespace as delimiters. you might want to change

    std::string token;
    std::istringstream tokens(line);
    while(tokens >> token) {
        std::size_t pos = token.find('=');
        if(pos != std::string::npos) {
            props[token.substr(0, pos)] = token.substr(pos + 1);
        }
    }

into this then:

    int value;
    std::string key;
    std::istringstream tokens(line);
    while(tokens >> std::ws && std::getline(tokens, key, '=') && 
          tokens >> std::ws >> value) {
            props[key] = value;
    }

std::ws just eats whitespace. you should change the type of props to

std::map<std::string, int> props;

then too, and make Line accept int instead of std::string's. i hope this is not too much information at once.

Johannes Schaub - litb
thanks, that is quite helpful. i shall try this out and let you know the result of the same... :-)
gagneet
the second solution for integers worked well. thanks
gagneet
+1  A: 

This seemed to do the trick. Of course you'd extract the code I've written in main and stick it in a class or something, but you get the idea.

#include <sstream>
#include <string>
#include <vector>
#include <map>

using namespace std;

vector<string> Tokenize(const string &str, const string &delim)
{
    vector<string> tokens;

    size_t p0 = 0, p1 = string::npos;
    while(p0 != string::npos)
    {
     p1 = str.find_first_of(delim, p0);
     if(p1 != p0)
     {
      string token = str.substr(p0, p1 - p0);
      tokens.push_back(token);
     }
     p0 = str.find_first_not_of(delim, p1);
    }

    return tokens;
}

int main()
{
    string data = "pc=1 ct=1 av=112 cv=1100 cp=1700 rec=2 p=10001 g=0 a=0 sz=5 cr=200 pc=1 ct=1 av=113 cv=1110 cp=1800 rec=2 p=10001 g=0 a=10 sz=5 cr=200";
    vector<string> entries = Tokenize(data, " ");
    map<string, int> items;

    for (size_t i = 0; i < entries.size(); ++i)
    {
     string item = entries[i];

     size_t pos = item.find_first_of('=');
     if(pos == string::npos)
      continue;

     string key = item.substr(0, pos);
     int value;
     stringstream stream(item.substr(pos + 1));
     stream >> value;
     items.insert (pair<string, int>(key, value));
    }

}
korona
thanks, this is helpful :-)
gagneet
You should let the stream operators do more of the work.
Martin York
this seems to work ok, only change was making the class and taking the input from a file. thanks :)
gagneet
+4  A: 

This is the perfect place to define the stream operators for your structure:

#include <string>
#include <fstream>
#include <sstream>
#include <istream>
#include <vector>
#include <algorithm>
#include <iterator>

std::istream& operator>> (std::istream& str,pky& value)
{
    std::string line;
    std::getline(str,line);

    std::stringstream dataStr(line);

    static const std::streamsize max = std::numeric_limits<std::streamsize>::max();

    // Code assumes the ordering is always as follows
    // pc=1 ct=1 av=112 cv=1100 cp=1700 rec=2 p=10001 g=0 a=0 sz=5 cr=200
    dataStr.ignore(max,'=') >> value.pc;
    dataStr.ignore(max,'=') >> value.ct_id;
    dataStr.ignore(max,'=') >> value.a; // Guessing av=
    dataStr.ignore(max,'=') >> value.cv_id;
    dataStr.ignore(max,'=') >> value.cp_id;
    dataStr.ignore(max,'=') >> value.fr; // Guessing rec=
    dataStr.ignore(max,'=') >> value.p_id;
    dataStr.ignore(max,'=') >> value.g;
    dataStr.ignore(max,'=') >> value.a_id;
    dataStr.ignore(max,'=') >> value.sz_id;
    dataStr.ignore(max,'=') >> value.cr_id;

    return str;
}

int main()
{
    std::ifstream  file("plop");

    std::vector<pky>  v;
    pky data;

    while(file >> data)
    {
        // Do Somthing with data
        v.push_back(data);
    }

    // Even use the istream_iterators
    std::ifstream    file2("plop2");
    std::vector<pky> v2;

    std::copy(std::istream_iterator<pky>(file2),
              std::istream_iterator<pky>(),
              std::back_inserter(v2)
             );
}
Martin York
this works ok, but i am having a problem with using the structure. actually, after i populate the structure, i need to use it as a parameter for a function. so, if my input does not contain some keys, these need to be as 0 in the structure. the structure may have more keys than the input.
gagneet
also, i am unable to find the library <istream>
gagneet
<istream> is not a "library". It is the header file for the declaration of the std::basic_istream template (among other things). It is in one of your compiler's include directories.
jmucchiello
A: 

What you get taught here, are monstrosities.

http://en.wikipedia.org/wiki/Scanf

Do not use this function to extract strings from untrusted data, but as long as you either trust data, or only get numbers, why not.

If you are familiar with Regular Expressions from using another language, use std::tr1::regex or boost::regex - they are the same. If not familiar, you will do yourself a favor by familiarizing yourself.

3yE