views:

217

answers:

8

I'm maintaining a legacy embedded device which interacts with the real world. Generally speaking, this device collects data from sensors, processes the data using its internal algorithm, and displays warning when data reaches a certain "bad" state.

For debugging purposes, we wish this device will send us on a regular basis many of the data it receives, as well as the data after it processed it.

We reached to the conclusion that most of the data can be described in a tabular form, something along the lines of

sensor|time|temprature|moisture
------+----+----------+--------
1     |3012|20        |0.5
2     |3024|22        |0.9

We obviously need to support more than one form of table.

So basically we need a protocol that is able to accept a certain set of tables description , and then to deliver table data according to its description.

An example pseudo code for sending data is:

table_t table = select_table(SENSORS_TABLE);
sensors_table_data_t data[] = {
    {1,3012,20,0.5},
    {1,3024,22,0.9}
    };
send_data(table,data);

An example pseudo code for receiving data is:

data_t *data = recieve();
switch (data->table) {
    case SENSORS_TABLE:
         puts("sensor|time|temprature|moisture");
         for (int i=0;i<data->length;i++) printf(
             "%5s|%4s|%9s|%9s\n",
              data->cell[i]->sensor,
              data->cell[i]->time,
              data->cell[i]->temprature,
              data->cell[i]->moisture);
         break;
    case USER_INPUT_TABLE:
         ...
}

Defining the tables could be done either off line both at the device and at the client computer communicating with it, or online. We can add a simple handshake protocol to agree upon table's format at the device's boot-time.

Since this is a legacy device, it supports only RS232 communication, and since its CPU is pretty slow (equivalent to 486), we cannot afford using any XML-like data transfer methods. Those are too expensive (either computation-time-wise, or bandwidth-wise). Sending raw SQL commands was also considered and rejected due to bandwidth considerations.

[edit]

For clarification, too reduce the overhead of sending the table header each time, I'm trying to avoid sending the table header each time I'm sending data. So that each time I'm sending a table row, I'll just have to send the tables id.

I also would like to note that most of the data I wish to pass is numerical, so text-based protocols are too wasteful.

Lastly I've seen Google's protocol buffers, it's close enough but it doesn't support C.

[/edit]

Any idea about a known protocol or implementation like what I described? Any better idea to send this data?

I'm aware to the fact that this protocol is not very hard to design, I had in mind a two phase protocol:

1) Handshake: send the headers of all tables you wish to fill. Each table description would include information about the size of each column.

2) Data: send the table index (according to handshake) followed by the actual data. Data will be followed by a checksum.

But I wish to avoid the small details of such design, and use some ready-made protocol. Or even better, use an available implementation.

+1  A: 

I am not aware of any protocol which does this (there might be one, but I don't know it.)

I'm sure you've thought of this: why not pass the format as a binary data stream as well?

pseudocode:

struct table_format_header {
  int number_of_fields; /* number of fields that will be defined in table */
                        /* sent before the field descriptions themselves  */
};

struct table_format {
   char column_name[8];   /* name of column ("sensor");  */
   char fmt_specifier[5]; /* format specifier for column */

   ... (etc)
}

Then you can compute the fields/columns (somehow), transmit the header struct so that the recipient can allocate buffers, and then iteratively transmit table_format structs for each of those fields. The struct would have all the information you need pertaining to that header - name, number of bytes in field, whatever. If space is really constricted, you can use bit-fields (int precision:3) to specify the different attributes

rascher
Are you suggesting to send the table header every time I'm sending data? I tried to avoid that.I'd rather the embedded device and the host computer would agree upon a limited set of tables, and that I'll specify the struct as the tables index.
Elazar Leibovich
Hm. Lets say you had 3 types of tables worth of data. type[0], type[1], type[2].You could send these table structures at the start of your connection, before the devices actually send the table data.Then, along with the data, you could send a field indicating *which* table structure you want to use (either a 0, 1, or 2).Basically, do all of that setup (whatever kind of setup is entailed) at the very start, before transmitting the data itself.
rascher
That's what I had in mind, however I thought somebody already written the code for that.. :)
Elazar Leibovich
A: 

In embedded work, it is generally suggested that the embedded device do as little work as possible, and let the client computer take advantage of its own speed an availability of tools. Given your example, I could collect the data, then format the table, just from looking at the max size of the data I received, or the max size of the column header (my choice). And since it is debugging info, it wouldn't matter too much if the table size changed from one collection to the next. Or, your device could "force" the column size just by sending header labels, or it could even transmit a first line of dummy data where all the data is zeros, but in the desired format and length.

gbarry
+1  A: 

You may want to try protocol buffers.

http://code.google.com/p/protobuf/

Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats.

Building off rascher's comment, protobufs compile the format so it's ridiculous efficient to transmit and receive. It's also extensible in case you want to add/remove fields later. And there are great APIs (e.g. protobuf python).

ramanujan
I'm aware of Google's protocol buffers, however unfortunately it does not have a C backend, which is a must for my project. However protocol buffers is indeed a good example that (over) fullfills my needs.
Elazar Leibovich
A: 

I would vote for CSV (see RFC 4180 for the best description of CSV) since it is the simplest format (see gbarry's answer).

As explained in the RFC (section 2, item 3), you will need an optional header with the column names.

The main think to take care about in the CSV sender will just be the escaping of "special" characters.

bortzmeyer
I'm mostly sending numbers. Numbers in ASCII is wasteful. Also see my edit about sending the table headers each time.
Elazar Leibovich
May be wasteful but also simple and *very* robust. I still vote for it.
bortzmeyer
A: 

Fundamentals of serial computation ...

[header] [data] [check-sum]

The [data] is the most important part, but [header] and [checksum] really helps to solve weird real word issues. How-ever small always try to live with a [header] and [checksum].

Now reducing the [header], [checksum] overload by making large chain of data definitely helps.

After reading the data, read and display the data using any format by doing anything from your host PC (which will be your debugging PC..)

Alphaneo
Saying that adding a header and a checksum is good is true. But I don't get the relation to my question.
Elazar Leibovich
+1  A: 

If all your data are of constant length, then you don't need any separator between them. So you could directly send the binary content. For example, the line:

sensor|time|temprature|moisture
------+----+----------+--------
1     |3012|20        |0.5

will be sent as:

0x01 0x0B 0xC4 0x14 [4 bytes for float 0.5]

I am assuming one byte representations for sensor and temperature, two bytes for time and 4 bytes (float) for moisture. You don't need to send the header.

Each row will be now be of constant length and the receiver will have to do the convertion job. The embedded device can easily send data in this format.

Now, there is also the problem of encapsulating the data in a message, so that the receiver knows, when a message start. You usually do this by adding a header and a footer:

[STX] message [ETX]

Typically ASCII characters STX and ETX are used (0x02 and 0x03 I think). The problem is that these values can also appear in the message body. So you need to add another layer in your transmission. When the byte 0x02 or 0x03 is to be sent, send it twice. On the receiver a single 0x02 byte denotes the start of the message. Additional 0x02 and 0x03 bytes within the message body must be removed.

Finally if the communication link is unreliable, you also need to add a checksum.

These techniques are typically used by serial protocols like PPP.

kgiannakakis
There's a small detail you forgot. I don't need to send the headers any time, but I do need to send the number of the table at first (the sensors table is not the only one I can), and also I need to send the headers once somewhen (or to "send" it through the client's and server's code). The protocol must be robust to modifications in the table layout.
Elazar Leibovich
In that case you need to have both "command" and "data" messages. Allocate the first byte after the STX to denote the message type. Then you are free to define the structure of the messages as it fits your needs.
kgiannakakis
I've seen STX is 0x02 and ETX is 0x04. It probably varies. Serial communication must be the least-regulated format ever.
Stephen Friederichs
According to http://www.asciitable.com/ STX is 0x02 and ETX 0x03. 0x04 is EOT (End of transmission). These are ASCII codes, dated back to the terminal years.
kgiannakakis
A: 

As someone said:

[header][data][checksum]

But if you want to extend that you could use:

[header][table_id][elements][data][checksum]

[header]   : start of frame
[table_id] : table
[elements] : payload size
[data]     : raw data
[checksum] : checksum/crc, just to be on the safe side

You can use "elements" as the number of fixed-size pieces of data or even the number of bytes in the "data" segment.

Headers and checksums can make your life easier when looking at thousands of hex characters on the screen.

EDIT:

Headers are a good way of telling your host's program that the message has begun/ended. Have you thought about that?

On the other hand you have to think about the use of headers in a statistical way. 4 bytes every 10 bytes is 40%, but only 1.6% in 256 bytes. So, size accordingly.

Marcelo MD
A: 

I know you've said you don't want to use text but you should consider using B64. This allows for straight-forward and relatively efficient binary to text and back to binary conversion. The overhead is 1/3. Every three bytes of binary are converted to four bytes of text values. After converting to text you can use simple data style protocols. On the transmitting device you only need to implement the encoder. See full code below:

/********************************************************************/
/*                                                                  */
/* Functions:                                                       */
/* ----------                                                       */
/* TBase64Encode()                                                  */
/* TBase64Decode()                                                  */
/* TBase64EncodeBlock()                                             */
/* TBase64DecodeBlock()                                             */
/*                                                                  */
/********************************************************************/

#include "yourstuff.h"


// This table is used to encode 6 bit binary to Base64 ASCII.
static char Base64Map[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef"
           "ghijklmnopqrstuvwxyz0123456789+/";

// This table is used to decode Base64 ASCII back to 6 bit binary.
static char Base64Decode[]=
{
    62,           // '+'
    99, 99, 99,         // **** UNUSED ****
    63,           // '/'
    52, 53, 54, 55, 56, 57, 58, 59, 60, 61,  // '0123456789'
    99, 99, 99, 99, 99, 99, 99,     // **** UNUSED ****
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9,    // 'ABCDEFGHIJ'
    10, 11, 12, 13, 14, 15, 16, 17, 18, 19,  // 'KLMNOPQRST'
    20, 21, 22, 23, 24, 25,      // 'UVWXYZ'
    99, 99, 99, 99, 99, 99,      // **** UNUSED ****
    26, 27, 28, 29, 30, 31, 32, 33, 34, 35,  // 'abcdefghij'
    36, 37, 38, 39, 40, 41, 42, 43, 44, 45,  // 'klmnopqrst'
    46, 47, 48, 49, 50, 51      // 'uvwxyz'
};




/** Convert binary data to Base64 data.
 *
 * @return  Size of output buffer if ok, -1 if problem (invalid paramaters).
 *
 * @param   input  - Pointer to input data.
 * @param   size   - Number of bytes to encode.
 * @param   output - Pointer to output buffer.
 *
 * @note    Up to caller to ensure output buffer is big enough. As a rough
 *    guide your output buffer should be (((size/3)+1)*4) bytes.
 */
int TBase64Encode( const BYTE *input, int size, PSTR output)
{
    int i, rc=0, block_size;

    while (size>0)
    {
     if (size>=3)
      block_size = 3;
     else
      block_size = size;

     i = TBase64EncodeBlock( input, block_size, output);

     if (i==-1)
      return -1;

     input += 3;
     output += 4;
     rc += 4;
     size -= 3;
    }

    return rc;
}




/** Convert Base64 data to binary data.
 *
 * @return  Number of bytes in output buffer, negative number if problem
 *    as follows:
 *     -1 : Invalid paramaters (bad pointers or bad size).
 *           -2 : Outside of range value for Base64.
 *     -3 : Invalid base 64 character.
 *
 * @param   input  - Pointer to input buffer.
 * @param   size   - Size of input buffer (in bytes).
 * @param   output - Pointer to output buffer.
 *
 * @note    Up to caller to ensure output buffer is big enough. As a rough
 *    guide your output buffer should be (((size/4)+1)*3) bytes.
 *    NOTE : The input size paramater must be multiple of 4 !!!!
 *    Note that error codes -2 and -3 essentiallty mean the same
 *    thing, just for debugging it means something slight different
 *    to me :-). Calling function can just check for any negative
 *    response.
 */
int TBase64Decode( CPSTR input, int size, BYTE *output)
{
    int output_size=0, i;

    // Validate size paramater only.
    if (size<=0 || size & 3)
     return -1;

    while (size>0)
    { 
     i = TBase64DecodeBlock( input, output);
     if (i<0)
      return i;

     output_size += i;
     output += i;
     input += 4;
     size -= 4;
    }

    return output_size;
}




/** Convert up to 3 bytes of binary data to 4 bytes of Base64 data.
 *
 * @return  0 if ok, -1 if problem (invalid paramaters).
 *
 * @param   input  - Pointer to input data.
 * @param   size   - Number of bytes to encode(1 to 3).
 * @param   output - Pointer to output buffer.
 *
 * @note    Up to caller to ensure output buffer is big enough (4 bytes).
 */
int TBase64EncodeBlock( const BYTE *input, int size, PSTR output)
{
    int i;
    BYTE mask;
    BYTE input_buffer[3];

    // Validate paramaters (rudementary).
    if (!input || !output)
     return -1;
    if (size<1 || size>3)
     return -1;

    memset( input_buffer, 0, 3);
    memcpy( input_buffer, input, size);

    // Convert three 8bit values to four 6bit values.
    mask = input_buffer[2];
    output[3] = mask & 0x3f;      // Fourth byte done...

    output[2] = mask >> 6;
    mask = input_buffer[1] << 2;
    output[2] |= (mask & 0x3f);   // Third byte done...

    output[1] = input_buffer[1] >> 4;
    mask = input_buffer[0] << 4;
    output[1] |= (mask & 0x3f);   // Second byte done...

    output[0] = input_buffer[0]>>2;  // First byte done...

    // TEST
//  printf("[%02x,%02x,%02x,%02x]", output[0], output[1], output[2], output[3]);

    // Convert 6 bit indices to base64 characters.
    for (i=0; i<4; i++)
     output[i] = Base64Map[output[i]];

    // Handle special padding.
    switch (size)
    {
     case 1:
      output[2] = '=';
     case 2:
      output[3] = '=';
     default:
      break;
    }


    return 0;
}




/** Convert 4 bytes of Base64 data to 3 bytes of binary data.
 *
 * @return  Number of bytes (1 to 3) if ok, negative number if problem
 *    as follows:
 *     -1 : Invalid paramaters (bad pointers).
 *           -2 : Outside of range value for Base64.
 *     -3 : Invalid base 64 character.
 *
 * @param   input  - Pointer to input buffer (4 bytes).
 * @param   output - Pointer to output buufer (3 bytes).
 *
 * @comm    While there may be 1, 2 or 3 output bytes the output
 *    buffer must be 3 bytes. Note that error codes -2 and -3
 *    essentiallty mean the same thing, just for debugging it
 *    means something slight different to me :-). Calling function
 *    can just check for any negative response.
 */
int TBase64DecodeBlock( CPSTR input, BYTE *output)
{
    int i, j;
    int size=3;
    BYTE mask;
    BYTE input_buffer[4];

    // Validate paramaters (rudementary).
    if (!input || !output)
     return -1;

    memcpy( input_buffer, input, 4);

    // Calculate size of output data.
    if (input_buffer[3]=='=')
    {
     input_buffer[3] = 43;
     size--;
    }
    if (input_buffer[2]=='=')
    {
     input_buffer[2] = 43;
     size--;
    }

    // Convert Base64 ASCII to 6 bit data.
    for (i=0; i<4; i++)
    {
     j = (int) (input_buffer[i]-43);
     if (j<0 || j>79)
      return -2;   // Invalid char in Base64 data.
     j = Base64Decode[j];
     if (j==99)   
      return -3;   // Invalid char in Base64 data.

     input_buffer[i] = (char) j;
    }

    // TEST
//  printf("[%02x,%02x,%02x,%02x]", input_buffer[0], input_buffer[1], input_buffer[2], input_buffer[3]);

    // Convert four 6bit values to three 8bit values.
    mask = input_buffer[1] >> 4;
    output[0] = (input_buffer[0]<<2) | mask; // First byte done.

    if (size>1)
    {
     mask = input_buffer[1] << 4;
     output[1] = input_buffer[2] >> 2;
     output[1] |= mask;    // Second byte done.

     if (size==3)
     {
      mask = input_buffer[2] << 6;
      output[2] = input_buffer[3] | mask;  // Third byte done.
     }
    }

    return size;
}
Tim Ring