views:

153

answers:

2

I am interfacing with a hardware device that streams data to my app over Wifi. The data is streaming in just fine. The data contains a character header (DATA:) that indicates a new record has begun. The issues is that the data I receive doesn't necessarily fall on the header boundary, so I have to capture the data until what I've captured contains the header. Then, everything that precedes the header goes into the previous record and everything that comes after it goes into a new record. I have this working, but wondered if anyone has done this before and has a good computer-sciencey way to solve the problem.

Here's what I do:

  1. Convert the NSData of the current read to an NSString

  2. Append the NSString to a placeholder string

  3. Check placeholder string for the header (DATA:). If the header is not there, just wait for the next read.

  4. If the header exists, append whatever precedes it to a previous record placeholder and hand that placeholder off to an array as a complete record that I can further parse into fields.

  5. Take whatever shows up after the header and place it in the record placeholder so that it can be appended to in the next read. Repeat steps 3 - 5.

Let me know if you see any flaws with this or have a suggestion for a better way.

Seems there should be some design pattern for this, but I can't think of one.

Thanks.

UPDATE: Here is a little bit of code:

uint8_t buf[1024];
unsigned int len = 0;
len = [(NSInputStream *)stream read:buf maxLength:1024];
if(len) {    
    [data appendBytes:(const void *)buf length:len];
    int bytesRead;
    bytesRead += len;
} else {
    NSLog(@"No data.");
}

How would this code be changed then to implement a finite state machine?

+1  A: 

That seems pretty much how I'd do it. The only thing I might do differently is write an NSData category that does the linear search of DATA: for me, just to save the overhead of converting it to a string. It wouldn't be that hard to do, either. Something like:

@interface NSData (Search)

- (NSRange) rangeOfData:(NSData *)aData;

@end

@implementation NSData (Search)

- (NSRange) rangeOfData:(NSData *)aData {
  const void * bytes = [self bytes];
  NSUInteger length = [self length];

  const void * searchBytes = [aData bytes];
  NSUInteger searchLength = [aData length];
  NSUInteger searchIndex = 0;

  NSRange foundRange = {NSNotFound, searchLength};
  for (NSUInteger index = 0; index < length; index++) {
    if (bytes[index] == searchBytes[searchIndex]) {
      //the current character matches
      if (foundRange.location == NSNotFound) {
        foundRange.location = index;
      }
      searchIndex++;
      if (searchIndex >= searchLength) { return foundRange; }
    } else {
      searchIndex = 0;
      foundRange.location = NSNotFound;
    }
  }
  return foundRange;
}

@end

Then you can just use:

NSData * searchData = [@"DATA:" dataUsingEncoding:NSUTF8StringEncoding];
while(receivingData) {
  if ([receivedData rangeOfData:searchData].location != NSNotFound) {
    //WOOO!
  }
}

(warning: typed in a browser)

Dave DeLong
Actually, come to think of it, the problem of the word DATA: being divided over two batches of receivedData (i.e. DA TA: or D ATA:) crops up with this solution as well, unless the sender has some mechanism to prevent that, of course.
Elise van Looij
A: 

This is a classic finite state machine problem. A lot of data protocols that are stream based can be described with a finite state machine.

Basically you have a state, and transition. Boost has a finite state machine library, but it could be overkill. You can implement it as a switch.

while(stream.hasData) {
char nextInput = stream.get();
switch(currentState) {
  case D: {
     if(nextInput == A)
       currentState = A;
     else
       currentState = D; //die 
  } case A: {
    //Same for A
  }
}
}

Requested elaboration:
Basically look at the diagram below...it's a finite state machine. At any given time the machine is in exactly one state. Every time a character is input into the state machine a transition is taken, and the current state moves. (possibly back into the same state). So all you have to do is model your networked data as a finite state machine then implement that machine. There are libraries that lay it out for you, then all you have to do is implement exactly what happens on each transition. For you that you probably mean interpreting or saving the byte of data. The interpretation depends on what transition. The transition depends on the current state and the current input. Here is an example FSM.

alt text
Note that if the characters DATA: are entered the state moves to the last circle. Any other sequence will keep the state in one of first 5 states. (top row) You can also have splits. So the FSM can make decisions, so if you get a sequence like DATA2: then you can branch off of that machine into the data2: part and interpret differently in a totally different part of the machine.

Chris H
just don't forget to `break`!
Dave DeLong
@Chris. I'm intrigued by this answer as it sounds like it would be what I'm looking for, but I'm not completely clear on how to do this in my context. I've updated the question with some code to clarify. Can you elaborate using the context I've given?
Matt Long