ansaurus

Question

CSV parser in C++

Answer 1

+3 A:

You might want to look at my FOSS project CSVfix, which is a CSV stream editor written in C++. The CSV parser is no prize, but does the job and the whole package may do what you need without you writing any code.

anon 2009-07-13 15:29:15

Seems great ... What about the status beta / production ?

neuro 2009-07-13 15:30:57

The status is "in development", as suggested by the version numbers. I really need more feed back from users before going to version 1.0. Plus I have a couple more features I want to add, to do with XML production from CSV.

anon 2009-07-13 15:36:35

Bookmarking it, and will give it a try next time I have to deal with those wonderful standard CSV files ...

neuro 2009-07-13 15:44:00

+1 I found a project I can learn from :)

AraK 2009-09-25 02:27:47

Answer 2

+14 A:

If you don't care about escaping comma and newline,
AND you can't embed comma and newline in quotes (If you can't escape then...)
then its only about three lines of code (OK 14 ->But its only 15 to read the whole file).

std::vector<std::string> getNextLineAndSplitIntoTokens(std::istream& str)
{
    std::vector<std::string>   result;
    std::string                line;
    std::getline(str,line);

    std::stringstream          lineStream(line);
    std::string                cell;

    while(std::getline(lineStream,cell,','))
    {
        result.push_back(cell);
    }
    return result;
}

I would just create a class representing a row.
Then stream into that object:

#include <iterator>
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <string>

class CVSRow
{
    public:
        std::string const& operator[](std::size_t index) const
        {
            return m_data[index];
        }
        std::size_t size() const
        {
            return m_data.size();
        }
        void readNextRow(std::istream& str)
        {
            std::string         line;
            std::getline(str,line);

            std::stringstream   lineStream(line);
            std::string         cell;

            m_data.clear();
            while(std::getline(lineStream,cell,','))
            {
                m_data.push_back(cell);
            }
        }
    private:
        std::vector<std::string>    m_data;
};

std::istream& operator>>(std::istream& str,CVSRow& data)
{
    data.readNextRow(str);
    return str;
}   
int main()
{
    std::ifstream       file("plop.csv");

    CVSRow              row;
    while(file >> row)
    {
        std::cout << "4th Element(" << row[3] << ")\n";
    }
}

But with a little work we could technically create an iterator:

class CVSIterator
{   
    public:
        typedef std::input_iterator_tag     iterator_category;
        typedef CVSRow                      value_type;
        typedef std::size_t                 difference_type;
        typedef CVSRow*                     pointer;
        typedef CVSRow&                     reference;

        CVSIterator(std::istream& str)  :m_str(str.good()?&str:NULL) { ++(*this); }
        CVSIterator()                   :m_str(NULL) {}

        // Pre Increment
        CVSIterator& operator++()               {if (m_str) { (*m_str) >> m_row;m_str = m_str->good()?m_str:NULL;}return *this;}
        // Post increment
        CVSIterator operator++(int)             {CVSIterator    tmp(*this);++(*this);return tmp;}
        CVSRow const& operator*()   const       {return m_row;}
        CVSRow const* operator->()  const       {return &m_row;}

        bool operator==(CVSIterator const& rhs) {return ((this == &rhs) || ((this->m_str == NULL) && (rhs.m_str == NULL)));}
        bool operator!=(CVSIterator const& rhs) {return !((*this) == rhs);}
    private:
        std::istream*       m_str;
        CVSRow              m_row;
};


int main()
{
    std::ifstream       file("plop.csv");

    for(CVSIterator loop(file);loop != CVSIterator();++loop)
    {
        std::cout << "4th Element(" << (*loop)[3] << ")\n";
    }
}

Martin York 2009-07-13 15:37:44

This is exactly what I wanted! Now, some extra credit..how would I make this into a class with a constructor and two methods: firstLine() and nextLine(). std::istream doesn't have a default constructor..so what do I use instead? Thanks for the help!!

User1 2009-07-14 03:19:43

Can somebody do two fixes above: lineSteam instead of linestream. Missing ")" on while.

User1 2009-07-14 03:20:24

first() next(). What is this Java! Only Joking.

Martin York 2009-07-14 05:15:08

or you could use some boost libraries to parse csv ... see below

stefanB 2010-03-21 00:15:16

Answer 3

A:

well if you need only simple CSV parsing, Neil Butterworth libs might be overkill in your case, you can just use the istream& getline (char* s, streamsize n, char delim );. It will only handle simple cases, but it can be enough as a starting point ...

neuro 2009-07-13 15:39:28

@Martin: arghhh not fast enough :-)

neuro 2009-07-13 15:40:10

/me really hate downvotes without comment ...

neuro 2009-10-09 14:28:57

Answer 4

A:

The Boost Tokenizer documentation specifically mentions parsing CSV files as one of the examples. It still might be overkill for what you need, but less so than writing a full-blown LL parser.

Kristo 2009-07-13 17:58:48

Answer 5

+9 A:

Solution using Boost Tokenizer:

std::vector<std::string> vec;
using namespace boost;
tokenizer<escaped_list_separator<char> > tk(
   line, escaped_list_separator<char>('\\', ',', '\"'));
for (tokenizer<escaped_list_separator<char> >::iterator i(tk.begin());
   i!=tk.end();++i) 
{
   vec.push_back(*i);
}

dtw 2009-07-13 23:41:01

The boost tokenizer doesn't fully support the complete CSV standard, but there are some quick workarounds. See http://stackoverflow.com/questions/1120140/csv-parser-in-c/1595366#1595366

Rolf Kristensen 2010-04-13 23:03:15

Answer 6

+2 A:

Excuse me, but this all seems like a great deal of elaborate syntax to hide a few lines of code.

Why not this:

/**

  Read line from a CSV file

  @param[in] fp file pointer to open file
  @param[in] vls reference to vector of strings to hold next line

  */
void readCSV( FILE *fp, std::vector<std::string>& vls )
{
    vls.clear();
    if( ! fp )
     return;
    char buf[10000];
    if( ! fgets( buf,999,fp) )
     return;
    std::string s = buf;
    int p,q;
    q = -1;
    // loop over columns
    while( 1 ) {
     p = q;
     q = s.find_first_of(",\n",p+1);
     if( q == -1 ) 
      break;
     vls.push_back( s.substr(p+1,q-p-1) );
    }
}

int _tmain(int argc, _TCHAR* argv[])
{
    std::vector<std::string> vls;
    FILE * fp = fopen( argv[1], "r" );
    if( ! fp )
     return 1;
    readCSV( fp, vls );
    readCSV( fp, vls );
    readCSV( fp, vls );
    std::cout << "row 3, col 4 is " << vls[3].c_str() << "\n";

    return 0;
}

ravenspoint 2009-07-14 14:39:29

Answer 7

A:

You could also take a look at capabilities of Qt library.

It has regular expressions support and QString class has nice methods, e.g. split() returning QStringList, list of strings obtained by splitting the original string with a provided delimiter. Should suffice for csv file..

To get a column with a given header name I use following: http://stackoverflow.com/questions/970330/c-inheritance-qt-problem-qstring/1011601#1011601

MadH 2009-09-18 10:28:20

Answer 8

+4 A:

The String Toolkit Library has a token grid class that allows you to load data either from text files, strings or char buffers, and to parse/process them in a row-column fashion.

You can specify the row delimiters and column delimiters or just use the defaults.

void foo()
{
   std::string data;
   data += "1,2,3,4,5\n";
   data += "0,2,4,6,8\n";
   data += "1,3,5,7,9\n";

   strtk::token_grid grid(data,data.size(),",");

   for(std::size_t i = 0; i < grid.row_count(); ++i)
   {
      strtk::token_grid::row_type r = grid.row(i);
      for(std::size_t j = 0; j < r.size(); ++j)
      {
         std::cout << r.get<int>(j) << "\t";
      }
      std::cout << std::endl;
   }
   std::cout << std::endl;
}

Beh Tou Cheh 2009-09-25 02:15:25

Answer 9

+2 A:

When using the Boost Tokenizer escaped_list_separator for CSV files, then one should be aware of the following:

It requires an escape-character (default back-slash - \)
It requires a splitter/seperator-character (default comma - ,)
It requires an quote-character (default quote - ")

The CSV format specified by wiki states that data fields can contain separators in quotes (supported):

1997,Ford,E350,"Super, luxurious truck"

The CSV format specified by wiki states that single quotes should be handled with double-quotes (escaped_list_separator will strip away all quote characters):

1997,Ford,E350,"Super ""luxurious"" truck"

The CSV format doesn't specify that any back-slash characters should be stripped away (escaped_list_separator will strip away all escape characters).

A possible work-around to fix the default behavior of the boost escaped_list_separator:

First replace all back-slash characters (\) with two back-slash characters (\\) so they are not stripped away.
Secondly replace all double-quotes ("") with a single back-slash character and a quote (\")

This work-around has the side-effect that empty data-fields that are represented by a double-quote, will be transformed into a single-quote-token. When iterating through the tokens, then one must check if the token is a single-quote, and treat it like an empty string.

Not pretty but it works.

Rolf Kristensen 2009-10-20 15:15:02

Answer 10

+3 A:

It is not overkill to use Spirit for parsing CSVs. Spirit is well suited for micro-parsing tasks. For instance, with Spirit 2.1, it is as easy as:

bool r = phrase_parse(first, last,

    //  Begin grammar
    (
        double_ % ','
    )
    ,
    //  End grammar

    space, v);

The vector, v, gets stuffed with the values. There is a series of tutorials touching on this in the new Spirit 2.1 docs that's just been released with Boost 1.41. I suggest you go check it out here:

http://tinyurl.com/yfucedn

The tutorial progresses from simple to complex. The CSV parsers are presented somewhere in the middle and touches on various techniques in using Spirit. The generated code is as tight as hand written code. Check out the assembler generated!

Joel de Guzman 2009-11-19 16:00:52

Answer 11

+3 A:

You can use Boost Tokenizer with escaped_list_separator.

escaped_list_separator parses a superset of the csv. Boost::tokenizer

This only uses Boost tokenizer header files, no linking to boost libraries required.

Here is an example, (see Parse CSV File With Boost Tokenizer In C++ for details or Boost::tokenizer ):

#include <iostream>     // cout, endl
#include <fstream>      // fstream
#include <vector>
#include <string>
#include <algorithm>    // copy
#include <iterator>     // ostream_operator
#include <boost/tokenizer.hpp>

int main()
{
    using namespace std;
    using namespace boost;
    string data("data.csv");

    ifstream in(data.c_str());
    if (!in.is_open()) return 1;

    typedef tokenizer< escaped_list_separator<char> > Tokenizer;
    vector< string > vec;
    string line;

    while (getline(in,line))
    {
        Tokenizer tok(line);
        vec.assign(tok.begin(),tok.end());

        // vector now contains strings from one row, output to cout here
        copy(vec.begin(), vec.end(), ostream_iterator<string>(cout, "|"));

        cout << "\n----------------------" << endl;
    }
}

stefanB 2010-02-24 00:02:34

Downvotes? Because ....? Well anyway thanks, your annoying and childish response is very constructive because now we all know why you don't like this response ... er ... you did not sleep well?

stefanB 2010-04-11 13:27:20

Answer 12

+3 A:

If you DO care about parsing CSV correctly, this will do it...relatively slowly as it works one char at a time.

 int ParseCSV(const string& csvSource, vector<vector<string> >& lines)
    {
       int result(0);

       bool inQuote(false);
       bool lastCharWasAQuote(false);
       bool newLine(false);
       string field;
       lines.clear();
       vector<string> line;

       string::const_iterator aChar = csvSource.begin();
       while (aChar != csvSource.end())
       {
          switch (*aChar)
          {
          case '"':
             newLine = false;
             if (lastCharWasAQuote == true)
             {
                lastCharWasAQuote = false;
                field += *aChar;
             }
             else
             {
                inQuote = !inQuote;
             }
             break;

          case ',':
             newLine = false;
             if (inQuote == true)
             {
                field += *aChar;
             }
             else
             {
                line.push_back(field);
                field.clear();
             }
             break;

          case '\n':
          case '\r':
             if (inQuote == true)
             {
                field += *aChar;
             }
             else
             {
                if (newLine == false)
                {
                   line.push_back(field);
                   lines.push_back(line);
                   field.clear();
                   line.clear();
                   newLine = true;
                }
             }
             break;

          default:
             newLine = false;
             field.push_back(*aChar);
             break;
          }

          aChar++;
       }

       if (line.size())
       {
          if (field.size())
             line.push_back(field);

          lines.push_back(line);
       }

       return result;
    }

Michael 2010-03-19 23:18:07

ansaurus

tags:

views:

answers:

CSV parser in C++

related questions