views:

2397

answers:

2

Ok, one for the SO hive mind...

I have code which has - until today - run just fine on many systems and is deployed at many sites. It involves threads reading and writing data from a serial port.

Trying to check out a new device, my code was swamped with 995 ERROR_OPERATION_ABORTED errors calling GetOverlappedResult after the ReadFile. Sometimes the read would work, othertimes I'd get this error. Just ignoring the error and retrying would - amazingly - work without dropping any data. No ClearCommError required.

Here's the snippet.

if (!ReadFile(handle,&c,1,&read, &olap))
    {
     if (GetLastError() != ERROR_IO_PENDING)
     {
      logger().log_api(LOG_ERROR,"ser_rx_char:ReadFile");
      throw Exception("ser_rx_char:ReadFile");
     }
    }

    WaitForSingleObjectEx(r_event, INFINITE, true);  // alertable, so, thread can be closed correctly.

    if (GetOverlappedResult(handle,&olap,&read, TRUE) != 0)
    {
     if (read != 1)
      throw Exception("ser_rx_char: no data");

     logger().log(LOG_VERBOSE,"read char %d ( read = %d) ",c, read);
    }
    else
    {
     DWORD err = GetLastError();
     if (err != 995)   //Filters our ERROR_OPERATION_ABORTED
     {
      logger().log_api(LOG_ERROR,"ser_rx_char: GetOverlappedResult");
      throw Exception("ser_rx_char:GetOverlappedResult");
     }
    }

My first guess is to blame the COM port driver, which I havent' used before (it's a RS422 port on a Blackmagic Decklink, FYI), but that feels like a cop-out.

Oh, and Vista SP1 Business 32-bit, for my sins.

Before I just put this down to "Someone else's problem", does anyone have any ideas of what might cause this?

+3  A: 

How are you setting over the OVERLAPPED structure before the ReadFile? - I always zero them (other than the hEvent, obviously), which is perhaps part superstition, but I have a feeling that it's caused me a problem in the past.

I'm afraid blaming the driver (if it's non-MS and not just a tiny tweak from the reference) is not completely unrealistic. To write a COM driver is an incredibly complex thing, and the difficulty with testing it is that every application ever written uses the serial ports and their IOCTLs slightly differently.

Another common problem is not to set the whole port up - for example not calling SetCommTimeouts or SetupComm. I've no idea if you're making this sort of mistake, but I have met people who say they're not using timeouts when they actually mean that they didn't call SetCommTimeouts so they're using them but don't have a notion what they're set to...

This kind of stuff can be murder for 3rd-party COM drivers, because people have often got away with any old crap with the MS driver, and it doesn't always work the same with another device.

Will Dean
I agree that the timeouts look like a likely reason, especially since no input gets lost and the ReadFile call is only reading one byte at a time. However, if the application is setting up the port properly, it's realistic to blame the driver including if it's MS's driver.
Windows programmer
I'm clearing the OVERLAPPED structure (apart from hEvent) and calling SetCommTimeouts with zeroes in all fields in COMMTIMEOUTS. I tried changing to non-overlapped i/o and got the same result. Time to try another comm port...
Roddy
A: 

in addition to zeroing the OVERLAPPED, you might also check how you're setting olap.hEvent, that is, what are your arguments to CreateEvent? If you're creating an event that's pre-signalled (i.e. the third argument to CreateEvent is TRUE) I would expect an immediate return. Also, don't forget that if you specify manualReset (the second argument to CreateEvent) as FALSE, GetOverlappedResult() will helpfully clear the event for you - which might explain why it works the second time around.

Can't really tell from your snippet whether either of these affect you - hope this helps.