views:

313

answers:

6

I have some string data in the following format: "Ronit","abc""defgh","abcdef,"fdfd",

Can somebody suggest some good code in C++ to return the comma-separated tokens, when the commas are not inside a string?

I.e. it should return

  1. "Ronit"
  2. "abc""defgh"
  3. "abcdef,"fdfd"

to be more clear

Thanks all of you for kind help.

Below is my sample file which is given as input,

First line will tell me how many columns i have #

Name1,Name2,Name3,Name4

"user1","user,user2","user3", "userrr

rrr4",

"user1","user2","user3","us

er4",

"user1","user,2","user3","user4"

"","user2,"", "", #

Below is an output of csv file, please give me compile code, so that i can test, thanks again for your kind help.

1st Row, 1)user1, 2)user,user2 3)user3 4)userrrr4

Note rr4 is in next line.

2nd Row, 1)user1 2)user2 3)user3 4)us er4

note er4 is in next line.

3rd row, 1)user1 2)user,2 3)user3 4)user4

4thr row 1) 2) user2 3) 4)

+1  A: 

It's not the best way but you can use the strtok function.

Nick D
strtok() is not a good idea. It modifies the input string while parsing it. There are cleaner methods than that.
Martin York
@Martin York, that's why I said "not the best way". Thanks for mentioning that.
Nick D
You can always make a copy of input string to avoid modifying it.
Pawka
A: 

This returns the split tokens exactly as you asked:

using namespace std;
vector<string> splitqc(std::string const& s) {
 vector<string> tokens;
 char last=0;
 unsigned start=0;    
 for (unsigned i=0,n=s.length;i!=n;++i) {
  char c=s[i];
  if (c==',' && last='"') {
    tokens.push_back(s.substr(start,(i-1)-start));
    start=i+1;
  }
  last=s[i];  
 }
 return tokens;
}

Here's a more general facility (the functor f gets called with each token; note that it won't have the close quote that's part of your delimiter; you'd have to add that yourself):

template <class Func>
inline void split_noquote(
    const std::string &csv,
    Func f,
    const std::string &delim=","
    )
{
    using namespace std;
    string::size_type pos=0,nextpos;
    string::size_type delim_len=delim.length();
    if (delim_len==0) delim_len=1;
    while((nextpos=csv.find(delim,pos)) != string::npos) {
        if (! f(string(csv,pos,nextpos-pos)) )
            return;
        pos=nextpos+delim_len;
    }
    if (csv.length()!=0)
        f(string(csv,pos,csv.length()-pos));
}

Usage: split_noquote(s,func,"\",")

wrang-wrang
The first function will not compile.
Vijay Mathew
+2  A: 

This looks like parsing a CSV file to me (even if it's not technically a file) - you could take a look at this question and answer.

MadKeithV
A: 

The following will assume that the input comes from some stream (you had a C++ token, after all). If that's not the case, look into string streams.

std::string read_quoted_string(std::istream& is)
{
  is >> std::ws;
  std::string garbage;
  std::getline(is,garbage,'"'); // everything up to opening quote
  if(!garbage.empty()) throw format_error("garbage outside of quotes", garbage);
  if(!is.good()) return std::string();

  std::string a_string;
  std::getline(is,a_string,'"'); // the string up to closing quote
  if(!is) return std::string();
  return a_string;
}

std::vector<std::string> split_input(std::istream& is)
{
  std::vector<std::string> result;
  while(is) {
    const std::string& a_string = read_quoted_string(is);
    if(is) {
      result.push_back(a_string);
      is >> std::ws;
      std::string garbage;
      std::getline(is,garbage,','); // next delimiter
      if(!garbage.empty()) throw format_error("garbage outside of quotes", garbage);
    }
  }
  if(!is.eof()) throw format_error("error reading token", a_string);
  return result;
}

This isn't the fastest you can have, but it's a simple and very likely a fast enough solution.

sbi
-1 for posting code that won't even compile.
Vijay Mathew
@Vijay: Thanks for pointing out these two cut'n'paste errors in such a friendly manner. I suppose this was so hard to fix that, even if someone managed to fix it, their attempts would certainly have broken the code to the point where it wouldn't do what was specified anymore. Of course, such a terrible error certainly justifies a down-vote. Anyway, thanks to your kind help I figured it out and __fixed it__. Would you now be so kind as to remove your down-vote? Thank you in advance. Oh, and BTW: While revenge indeed is sweet, sweets might make your teeth rot.
sbi
`<sigh>` Would the second down-voter please be so kind as to tell me what's wrong with my solution?
sbi
Hi sbi,am new user anyway can u look to my query, i have edited my question,many thanks
@sbi I have removed my down-vote. I am hypoglycemic, so sweets are good for me :)
Vijay Mathew
@Vijay: Thanks!
sbi
A: 

I don't think something like "abcdef,"fdfd" could be parsed. That is illegal, for any language and for any data format, because one of the quotes are not terminated. It should be "abcdef,fdfd". Given that all strings are properly terminated, the following function will give the output you want.

std::istream& tokenize_quoted_strings(std::istream& in, 
                         std::string& dest,
                        char delim)
{
  dest.erase();
  char ch = 0;
  bool in_quotes = false;
  while (in)
    {
      if (!in.get(ch)) break;      
      if (!in_quotes && ch == delim) break;
      dest.push_back(ch);
      if (ch == '"') in_quotes = !in_quotes;
    }
  return in;
}

The following function uses *tokenize_quoted_strings* to split a string into a vector of tokens:

typedef std::vector<std::string> StringList;

void tokenize_line(const std::string& line,
        StringList& tokens)
{
  std::istringstream iss(line);
  std::string token;
  tokens.clear();
  while (tokenize_quoted_strings(iss, token, ','))
    tokens.push_back(token);
}

Usage:

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>

int main()
{  
  std::fstream in("test.txt", std::ios_base::in);
  std::string line;
  StringList tokens;
  while (getline(in, line))
    {
      tokenize_line(line, tokens);
      size_t sz = tokens.size();
      for (size_t i=0; i<sz; ++i)
    std::cout << (i+1) << ") " << tokens[i] << ' ';
      std::cout << '\n';
    }
  return 0;
}

Note that it does not care about C style escaped quotes.

Vijay Mathew
Fixed the bug reported by sbi.
Vijay Mathew
Thanks vijay, i just realized my mistake Thanks!
@Ronit This is kind of home work. Anyway, I have updated my code with your latest input.
Vijay Mathew
@Ronit Please remember to reward my effort with an up-vote!
Vijay Mathew
wish i could, its says i dont have reputation :).
@Vijay thanks for much needed help, suppose i have data "a,bc,d"",CDEF""","BCD,E","CE,e,""", can you modify ur function to take care this data, ("",) will be part of data and (""",) will be string terminator, where 1) "a,bc,d"",CDEF""",2) "BCD,E",3) "CE,e,""",Thanks a lot , waiting 4 ur reply.
@Ronit Why don't you make the modification yourself? This is a damn simple task, right?
Vijay Mathew
yes i did, but i am not so confident being new to C/C++, i am not so confident. i make one change, it breaks other :)
+1  A: 

Just download boost and use boost.tokenizer.
It's the best solution there is.

the_drow