views:

39

answers:

3

I am having some problems with my network IO code on OS5 of BlackBerry.

I keep getting sporadic hangs and eventually TCP timeout exceptions during my IO operations.

I am using the 5.0 networking APIs for establishing the connection which works flawlessly every time. The problem is when doing the actual IO. I have a background worker thread that services IO requests from a queue. There is only a single background thread so all requests are serialized onto this thread.

Completion notification is done through a delegate interface that is passed in when the request is queued. The completion delegate is called on the background worker thread but clients are free to repost this to the event thread via invokeLater to do UI updates etc.

Notes:
HttpRequest is my own class that holds data about the request.
MutableData is my own class that holds the data that is read.
BUFFER_SIZE = 2048

HttpConnection getConnectionForRequest(final HttpRequest inRequest) {
    final String url = inRequest.getURL();
    final int[] availableTransportTypes = TransportInfo.getAvailableTransportTypes();
    final ConnectionFactory connectionFactory = new ConnectionFactory();
    connectionFactory.setPreferredTransportTypes(availableTransportTypes);
    connectionFactory.setConnectionMode(ConnectionFactory.ACCESS_READ);
    final ConnectionDescriptor connectionDescriptor = connectionFactory.getConnection(url);
    HttpConnection connection = null;
    if (connectionDescriptor != null) {
        connection = (HttpConnection) connectionDescriptor.getConnection();
    }
    return connection;
}

public void run() {
    while (isRunning()) {
        final HttpRequest request = waitForRequest(); // This blocks waiting on a request to appear in the queue.
        final HttpConnection connection = getConnectionForRequest(request);
        final MutableData data = new MutableData();
        final InputStream inputStream = connection.openInputStream();
        final byte[] readBuffer = new byte[BUFFER_SIZE];
        int chunkSize;
        // *** The following read call sporadically hangs and eventually throws a TCP timeout exception.
        while ((chunkSize = inputStream.read(readBuffer, 0, BUFFER_SIZE)) != -1) {
            data.appendData(readBuffer, 0, chunkSize);
        }
        mDelegate.receivedDataForRequest(request, data);
    }
}

When it hangs it always eventually throws a TCP timeout error after about 30 seconds or so. If this occured occasionaly I would just chalk it up to normal network congestion but it happens frequently enough to indicate a deeper problem.

Edit:

It happens on various simulators and the 2 physical devices I have. The simulators I have tried are... Storm 9550 Tour 9630 Bold 9000 Pearl 9100 Curve 8530

I have a Curve 8530 and Storm 9550 devices and it happens on both of those as well.

Any help would be appreciated.

A: 

You might want to try the Available() method. Even though you are serializing data on one backround thread, it looks like the request is created in the main thread. You may be running into some weird race condition there.

Byron Whitlock
I also thought I might be dealing with a strange race condition but for the life of me I can't locate one. The only piece of shared data is the request queue and that is properly synchronized between the posting thread and the processing thread.
Maven
A: 

Can you add some logging to display the transport type it is that the device is choosing to use for each connection? Perhaps it's a case of the transport-selection API picking a transport it thinks will work, when in fact it doesn't.

Marc Novakowski
Every single time it selects TRANSPORT_TCP_CELLULAR.
Maven
A: 

It was suggested elsewhere to put in a stall detector in my network IO thread and when a stall is detected, interrupt the thread and restart the request. I do this by starting a timer before I begin the request and as I read each chunk of data I reset the timer. If the timer expires before I can read a chunk I assume the network has stalled and I interrupt the thread and start over on that request.

I've done this and it does improve things by at least reducing the delay I have to wait before I can continue the request since I don't have to wait for the TCP timeout which can take a very long time.

Interrupting the current IO operation and restarting seems to prod the network back to life for a while usually running fine for several minutes before stalling again. I log the stalls to the console when debugging and I get quite a bit of them.

This is a very strange problem and I'm not totally happy with the stall detection solution. It seems to be just masking the problem but it does allow me to somewhat address the long delays that I have been getting.

Maven