tags:

views:

334

answers:

3

I am working on a program that reads a stream of binary data from a serial port and needs to parse and format the input.

Data is read in constantly and needs to be displayed as a full string. Each string has a "start code" of 3 bytes and an "end code" of 3 bytes. I need to write a parser that will find the data based on start and end codes - I'm assuming a regex parser is the way to do this.

I've read a bunch of regular expressions over the last day or two but it's just not clicking. Help?

start code: 0x16 < 0x02 (will not be separated by space) end code : 0x03 > 0x17 (will not be separated by space)

can anybody give the regex that will find these values? and is there a way to find them in c# without removing them from the string (i.e. without considering them normal delimiters in, e.g. String.Split())?

+1  A: 

If it's as simple matching a few byte values, you could look at writing a simple Finite State Machine to match the start and end. Easier to test and represent as code.

Joe
A: 

I think a regex is overkill in this case. I would just buffer the data bytes as they arrive, and after each byte is received, check if if ends with your end code. Something approximately like this (written on the fly, don't just paste & compile):

var buffer=new List<byte>();
var endCode=new byte[] {3, '>', 0x17};

// In a loop:

byte? received=ReceiveByte(); //Return null if no new byte available
if(byte.HasValue) {
  buffer.Add(received);
  if(buffer.Skip(buffer.Count()-endCode.Length).Take(endCode.Length).SequenceEqual(endCode){
    //Process the received data in buffer
    buffer.Clear();
  }
}
Konamiman
with some modifications, this idea worked wonderfully. thanks a ton.
Slim
A: 

A Regex in .NET handles Unicode character strings. When dealing with binary data bytes, a Regex will need some form of decoding into Unicode. Data kept as byte arrays is not fitting for Regex use. Either find a meaningful (for your data) Encoding, or forget the regexp engine.

gimel