tags:

views:

55

answers:

3

We have a hardware system with some FPGA's and an FTDI USB controller. The hardware streams data over USB bulk transfer to the PC at around 5MB/s and the software is tasked with staying in sync, checking the CRC and writing the data to file.

The FTDI chip has a 'busy' pin which goes high while its waiting for the PC to do its business. There is a limited amount of buffering in the FTDI and elsewhere on the hardware.

The busy line is going high for longer than the hardware can buffer (50-100ms) so we are losing data. To save us from having to re-design the hardware I have been asked to 'fix' this issue!

I think my code is quick enough as we've had it running up to 15MB/s, so that leaves an IO bottleneck somewhere. Are we just expecting too much from the PC/OS?

Here is my data entry point. Occasionally we get a dropped bit or byte. If the checksum doesn't compute, I shift through until it does. byte[] data is nearly always 4k.

    void ftdi_OnData(byte[] data)
    {
        List<byte> buffer = new List<byte>(data.Length);
        int index = 0;

        while ((index + rawFile.Header.PacketLength + 1) < data.Length)
        {
            if (CheckSum.CRC16(data, index, rawFile.Header.PacketLength + 2)) // <- packet length + 2 for 16bit checksum
            {
                buffer.AddRange(data.SubArray<byte>(index, rawFile.Header.PacketLength));                 
                index += rawFile.Header.PacketLength + 2; // <- skip the two checksums, we dont want to save them...
            }
            else
            {
                index++; // shift through
            }
        }

        rawFile.AddData(buffer.ToArray(), 0, buffer.Count);
    }
+4  A: 

Tip: do not write to a file.... queue.

Modern computers have multiple processors. If you want certain things as fast as possible, use multiple processors.

  • Have on thread deal with the USB data, check checksums etc. It queues (ONLY) the results to a thread safe queue.
  • Another thread reads data from the queue and writes it to a file, possibly buffered.

Finished ;)

100ms is a lot of time for decent operations. I have successfully managed around 250.000 IO data packets per second (financial data) using C# without a sweat.

Basically, make sure your IO threads do ONLY that and use your internal memory as buffer. Especially dealing with hardware on one end the thread doing that should ONLY do that, POSSIBLY if needed running in high priority.

TomTom
I suppose a simple test would be power up the scope, comment out the writing to file and give it a test!
Tim
@Tim, this is the answer. The most important qualification is the "thread-safe" requirement. You'll create a mutex that is shared between the two threads, locked whenever the queue is written to/read from. The trick here is that in the file writing thread, you should lock, copy a chunk of data into a local buffer, and then unlock. Don't lock any longer than you have to or your two threads will have no advantage over one thread.
David Gladfelter
Actually I use NO mutx - use way too much time. I use a Spinlock (new with .net 4.0) because my code oes nothing else than take something out of the queue or put it in ;) On top, if you create new buffers on every item (as I do) you really ONLY lock on insert / retrieve... no copy operation.
TomTom
@TomTom, thanks for the tip about Spinlock, I'll have to check that out when I get off of 3.5. I program mostly C++, so I inherently favor fixed-size buffers, incurring the extra cost of copying since that is preferable to allocation/deallocation and the concomitant heap fragmentation in a non-garbage-collected system. I've run out of heap after a high-speed data aq. ran for a few days due to heap fragmentation before. With a garbage collector, continually allocating new buffers and avoiding the copying may well be more efficient.
David Gladfelter
A: 

So what does your receiving code look like? Do you have a thread running at high priority responsible solely for capturing the data and passing it in memory to another thread in a non-blocking fashion? Do you run the process itself at an elevated priority?

Have you designed the rest of your code to avoid the more expensive 2nd gen garbage collections? How large are you buffers, are they on the large object heap? Do you reuse them efficiently?

Hightechrider
I'm using the 'received data' event from the FTDI library. It calls my function and passes it byte[]. Its usually always 4k... I'll update the question with some code...
Tim
+1  A: 

To get good read throughput on Windows on USB, you generally need to have multiple asynchronous reads (or very large reads, which is often less convenient) queued onto the USB device stack. I'm not quite sure what the FTDI drivers / libraries do internally in this regard.

Traditionally I have written mechanisms with an array of OVERLAPPED strutures and an array of buffers, and kept shovelling them into ReadFile as soon as they're free. I was doing 40+MB/s reads on USB2 like this about 5-6 years ago, so modern PCs should certainly be able to cope.

It's very important that you (or your drivers/libraries) don't get into a "start a read, finish a read, deal with the data, start another read" cycle, because you'll find that the bus is idle for vast swathes of time. A USB analyser would show you if this was happening.

I agree with the others that you should get off the thread that the read is happening as soon as possible - don't block the FTDI event handler for any longer than at takes to put the buffer into another queue.

I'd preallocate a circular queue of buffers, pick the next free one and throw the received data into it, then complete the event handling as quickly as possible.

All that checksumming and concatenation with its attendant memory allocation, garbage collection, etc, can be done the other side of potentially 100s of MB of buffer time/space on the PC. At the moment you may well be effectively asking your FPGA/hardware buffer to accommodate the time taken for you to do all sorts of ponderous PC stuff which can be done much later.

I'm optimistic though - if you can really buffer 100ms of data on the hardware, you should be able to get this working reliably. I wish I could persuade all my clients to allow so much...

Will Dean
We've got about 10ms of buffer :-)
Tim
I still think you'll probably cope with 10ms, as long as you don't stall the reader unnecessarily. Where in the UK are you?
Will Dean