ansaurus

Question

c++ stl: a good way to parse a sensor response

Answer 1

A:

It seems to me that something as simple as the following should solve the problem:

string line;
while( getline( input, line ) && line != "" ) {    
  int val = atoi( line.substr(0, 3 ).c_str() );
  string data = line.substr( 3, line.size() - 4 );
  char csum = line[ line.size() - 1 ];
  // process val, data and csum
}

In a real implementation you would want to add error checking, but the basic logic should remain the same.

anon 2010-01-30 10:27:07

er, Neil, you copy the string. string data = line.substr....I dont want to be naggy, but it is a really BIG string in some cases.

AndreasT 2010-01-30 10:38:13

@ AndreasT How big? And how much memory do you have? If you are parsing, you typically need to make copies. Or you could use iterators. You need to make your question more explicit - for example, what does the consumer of the parsed data want to see?

anon 2010-01-30 10:45:53

@ AndreasT And in your original question, you said "after at most 64 byte", so the copying at the place you pointed to in your comment won't be significant.

anon 2010-01-30 10:50:19

In other words, is the checksum always followed by a linefeed? Or does the linefeed occur after some multiple of 64 bytes?

gary 2010-01-30 11:11:24

@Neil: since strtok() for example jumps through some hoops to avoid copying, I don't think that parsing usually means copying.

AndreasT 2010-02-01 07:58:36

@AndreasT I don't think strtok can be held up as an example of good practice. In the parsers I use and write, copies are always made.

anon 2010-02-01 09:58:53

Answer 2

A:

Why are you concerned with copying? Is it the time overhead or the space overhead?

It sounds like you are reading all of the unparsed data into a big buffer, and now you want to write code that makes the big buffer of unparsed data look like a slightly smaller buffer of parsed data (the data minus the checksums and linefeeds), to avoid the space overhead involved in copying it into the slightly smaller buffer.

Adding a complicated layer of abstraction isn't going to help with the time overhead unless you only need a small portion of the data. And if that's the case, maybe you could just figure out which small portion you need, and copy that. (Then again, most of the abstraction layer may be already written for you, e.g. the Boost iterator library.)

An easier way to reduce the space overhead is to read the data in smaller chunks (or a line at a time), and parse it as you go. Then you only need to store the parsed version in a big buffer. (This assumes you're reading it from a file / socket / port, rather than being passed a large buffer that is not under your control.)

Another way to reduce the space overhead is to overwrite the data in place as you parse it. You will incur the cost of copying it, but you will only need one buffer. (This assumes that the 64-byte data doesn't grow when you parse it, i.e. it is not compressed.)

bk1e 2010-01-30 17:29:24

I am concerned with copying because it is the single thing that will have any performance impact whatsoever. This runs on a robot and resources are generally scarce. The rest is just simple math that cannot be avoided in any case. The data comes in big chunks as I stated in the question. Thanks for your thoughts, but really, the question is about an stl solution, as generic and fast as possible. Nothing else. I already implemented this stuff, just not in any way generic or elegant. This question is motivated by my wanting to learn more tricks with the stl, not the problem itself.

AndreasT 2010-02-01 09:03:05

@AndreasT, thanks for clarifying that. It's not purely STL, but have you looked into the Boost iterator library? It has iterator adapters for filtering and transforming data at the iterator level: http://www.boost.org/doc/libs/1_41_0/libs/iterator/doc/index.html

bk1e 2010-02-01 16:53:19

Answer 3

+2 A:

As much as I like STL, I don't think there's anything wrong with doing things manually, especially if the problem does not really fall into the cases the STL has been made for. Then again, I'm not sure why you ask. Maybe you need an STL input iterator that (checks and) discards the check sums and LF characters and emits the integers?

I assume the encoding is such that LF can only appear at those places, i.e., some kind of Base-64 or similar?

Christopher Creutzig 2010-01-30 21:07:41

Why I ask: I have come to love the simplicity and generality of stl solutions once you get over the initial rise in the learning curve. I just hoped that there was some algorithm/iterator/functor combination that can cope with this data format. But as you rightly state, this seems to be a situation that the stl has not been made for.

AndreasT 2010-02-01 08:56:46

Other than that, I think using a custom iterator might just be the answer that currently fits the question best. (+1) This rather adds more complexity than it solves, since the logic in the iterator is almost the same as I use for the "manual" solution + the iterator declaration overhead. It doesn't really provide any "divide and conquer" advantages, but it fits most of the requirements. Thank you.

AndreasT 2010-02-01 09:17:14

Answer 4

A:

You should think of your communication protocol as being layered. Treat

|<--64byte-data-->|1byte checksum|LF

as fragments to be reassembled into larger packets of contiguous data. Once the larger packet is reconstituted, it is easier to parse its data contiguously (you don't have to deal with measurements being split up across fragments). Many existing network protocols (such as UDP/IP) does this sort of reassembly of fragments into packets.

It's possible to read the fragments directly into their proper "slot" in the packet buffer. Since your fragments have footers instead of headers, and there is no out-of-order arrival of your fragments, this should be fairly easy to code (compared to copyless IP reassembly algorithms). Once you receive an "empty" fragment (the duplicate LF), this marks the end of the packet.

Here is some sample code to illustrate the idea:

#include <vector>
#include <cassert>

class Reassembler
{
public:
    // Constructs reassembler with given packet buffer capacity
    Reassembler(int capacity) : buf_(capacity) {reset();}

    // Returns bytes remaining in packet buffer
    int remaining() const {return buf_.end() - pos_;}

    // Returns a pointer to where the next fragment should be read
    char* back() {return &*pos_;}

    // Advances the packet's position cursor for the next fragment
    void push(int size) {pos_ += size; if (size == 0) complete_ = true;}

    // Returns true if an empty fragment was pushed to indicate end of packet
    bool isComplete() const {return complete_;}

    // Resets the reassembler so it can process a new packet
    void reset() {pos_ = buf_.begin(); complete_ = false;}

    // Returns a pointer to the accumulated packet data
    char* data() {return &buf_[0];}

    // Returns the size in bytes of the accumulated packet data
    int size() const {return pos_ - buf_.begin();}

private:
    std::vector<char> buf_;
    std::vector<char>::iterator pos_;
    bool complete_;
};


int readFragment(char* dest, int maxBytes, char delimiter)
{
    // Read next fragment from source and save to dest pointer
    // Return number of bytes in fragment, except delimiter character
}

bool verifyChecksum(char* fragPtr, int size)
{
    // Returns true if fragment checksum is valid
}

void processPacket(char* data, int size)
{
    // Extract measurements which are now stored contiguously in packet
}

int main()
{
    const int kChecksumSize = 1;
    Reassembler reasm(1000); // Use realistic capacity here
    while (true)
    {
        while (!reasm.isComplete())
        {
            char* fragDest = reasm.back();
            int fragSize = readFragment(fragDest, reasm.remaining(), '\n');
            if (fragSize > 1)
                assert(verifyChecksum(fragDest, fragSize));
            reasm.push(fragSize - kChecksumSize);
        }
        processPacket(reasm.data(), reasm.size());
        reasm.reset();
    }
}

The trick will be making an efficient readFragment function that stops at every newline delimiter and stores the incoming data into the given destination buffer pointer. If you tell me how you acquire your sensor data, then I can perhaps give you more ideas.

Emile Cormier 2010-01-31 09:16:08

I had thought of that too. It is essentially what I meant with solution one.

AndreasT 2010-02-01 08:30:46

Ah, sorry about that Andreas. I'll leave my answer here for others to laugh.. er look at for inspiration.

Emile Cormier 2010-02-01 19:02:33

Answer 5

A:

An elegant solution this isn't. It would be more so by using a "transition matrix", and only reading one character at a time. Not my style. Yet this code has a minimum of redundant data movement, and it seems to do the job. Minimally C++, it really is just a C program. Adding iterators is left as an exercise for the reader. The data stream wasn't completely defined, and there was no defined destination for the converted data. Assumptions noted in comments. Lots of printing should show functionality.

// convert series of 3 ASCII decimal digits to binary
// there is a checksum byte at least once every 64 bytes - it can split a digit series
// if the interval is less than 64 bytes, it must be followd by LF (to identify it)
// if the interval is a full 64 bytes, the checksum may or may not be followed by LF
// checksum restricted to a simple sum modulo 10 to keep ASCII format
// checksum computations are only printed to allowed continuation of demo, and so results can be
// inserted back in data for testing
// there is no verification of the 3 byte sets of digits
// results are just printed, non-zero return indicates error


int readData(void) {
    int binValue =  0, digitNdx = 0,  sensorCnt = 0, lineCnt = 0;
    char oneDigit;
    string sensorTxt;

    while( getline( cin, sensorTxt ) ) {
        int i, restart = 0, checkSum = 0, size = sensorTxt.size()-1;
        if(size < 0)
            break;
        lineCnt++;
        if(sensorTxt[0] == '#')
            continue;
        printf("INPUT: %s\n", &sensorTxt[0]);           // gag

        while(restart<size) {
            for(i=0; i<min(64, size); i++) {
                oneDigit = sensorTxt[i+restart] & 0xF;
                checkSum += oneDigit;
                binValue = binValue*10 + oneDigit;
                //printf("%3d-%X ", binValue, sensorTxt[i+restart]);
                digitNdx++;
                if(digitNdx == 3) {
                    sensorCnt++;
                    printf("READING# %d (LINE %d) = %d  CKSUM %d\n",
                           sensorCnt, lineCnt, binValue, checkSum);
                    digitNdx = 0;
                    binValue = 0;
                }
            }
            oneDigit = sensorTxt[i+restart] & 0x0F;
            char compCheckDigit = (10-(checkSum%10)) % 10;
            printf("    CKSUM at sensorCnt %d  ", sensorCnt);
            if((checkSum+oneDigit) % 10)
                printf("ERR got %c exp %c\n", oneDigit|0x30, compCheckDigit|0x30);
            else
                printf("OK\n");
            i++;
            restart += i;
        }
    }
    if(digitNdx)
        return -2;
    else
        return 0;
}

The data definition was extended with comments, you you can use the following as is:

# normal 64 byte lines with 3 digit value split across lines
00100200300400500600700800901001101201301401501601701801902002105
22023024025026027028029030031032033034035036037038039040041042046
# short lines, partial values - remove checksum digit to combine short lines
30449
0451
0460479
0480490500510520530540550560570580590600610620630641
# long line with embedded checksums every 64 bytes
001002003004005006007008009010011012013014015016017018019020021052202302402502602702802903003103203303403503603703803904004104204630440450460470480490500510520530540550560570580590600610620630640
# dangling digit at end of file (with OK checksum)
37

gary 2010-01-31 17:31:14

<:| I am really sorry for the effort you made. But this solution is neither particularly stl'y nor avoids copying. I mentioned that the problem in general is solved. I just wanted to know if there is some clever combination of algorithms, iterators, streams and a functor to do this elegantly.

AndreasT 2010-02-01 08:38:11

Answer 6

+1 A:

As others have said, there is no silver bullet in stl/boost to elegantly solve your problem. If you want to parse your chunk directly via pointer arithmetic, perhaps you can take inspiration from std::iostream and hide the messy pointer arithmetic in a custom stream class. Here's a half-arsed solution I came up with:

#include <cctype>
#include <iostream>
#include <vector>
#include <boost/lexical_cast.hpp>

class Stream
{
public:
    enum StateFlags
    {
        goodbit = 0,
        eofbit  = 1 << 0,   // End of input packet
        failbit = 1 << 1    // Corrupt packet
    };

    Stream() : state_(failbit), csum_(0), pos_(0), end_(0) {}
    Stream(char* begin, char* end) {open(begin, end);}
    void open(char* begin, char* end)
        {state_=goodbit; csum_=0; pos_=begin, end_=end;}
    StateFlags rdstate() const {return static_cast<StateFlags>(state_);}
    bool good() const {return state_ == goodbit;}
    bool fail() const {return (state_ & failbit) != 0;}
    bool eof() const {return (state_ & eofbit) != 0;}
    Stream& read(int& measurement)
    {
        measurement = readDigit() * 100;
        measurement += readDigit() * 10;
        measurement += readDigit();
        return *this;
    }

private:
    int readDigit()
    {
        int digit = 0;

        // Check if we are at end of packet
        if (pos_ == end_) {state_ |= eofbit; return 0;}

        /* We should be at least csum|lf|lf away from end, and we are
            not expecting csum or lf here. */
        if (pos_+3 >= end_ || pos_[0] == '\n' || pos_[1] == '\n')
        {
            state_ |= failbit;
            return 0;
        }

        if (!getDigit(digit)) {return 0;}
        csum_ = (csum_ + digit) % 10;
        ++pos_;

        // If we are at checksum, check and consume it, along with linefeed
        if (pos_[1] == '\n')
        {
            int checksum = 0;
            if (!getDigit(checksum) || (checksum != csum_)) {state_ |= failbit;}
            csum_ = 0;
            pos_ += 2;

            // If there is a second linefeed, we are at end of packet
            if (*pos_ == '\n') {pos_ = end_;}
        }
        return digit;
    }

    bool getDigit(int& digit)
    {
        bool success = std::isdigit(*pos_);
        if (success)
            digit = boost::lexical_cast<int>(*pos_);
        else
            state_ |= failbit;
        return success;
    }

    int csum_;
    unsigned int state_;
    char* pos_;
    char* end_;
};


int main()
{
    // Use (8-byte + csum + LF) fragments for this example
    char data[] = "\
001002003\n\
300400502\n\
060070081\n\n";

    std::vector<int> measurements;
    Stream s(data, data + sizeof(data));
    int meas = 0;

    while (s.read(meas).good())
    {
        measurements.push_back(meas);
        std::cout << meas << " ";
    }

    return 0;
}

Maybe you'll want to add extra StateFlags to determine if failure is due to checksum error or framing error. Hope this helps.

Emile Cormier 2010-02-01 18:58:55

ansaurus

tags:

views:

answers:

c++ stl: a good way to parse a sensor response

related questions