views:

441

answers:

3

We use sockets a lot in a program that I work on and we handle connections from up to about 100 machines simultaneously at times. We have a combination of non-blocking I/O in use with a state table to manage it and traditional Java sockets which use threads.

We have quite a few problems with non-blocking sockets and I personally like using threads to handle sockets much better. So my question is:

How much saving is made by using non-blocking sockets on a single thread? How bad is the context switching involved in using threads and how many concurrent connections can you scale to using the threaded model in Java?

+1  A: 

For your questions the best method might be to build a test program, get some hard measurement data and make the best decision based on the data. I usually do this when trying to make such decisions, and it helps to have hard numbers to bring with you to back up your argument.

Before starting though, how many threads are you talking about? And with what type of hardware are you running your software?

Dr. Watson
Good idea,The program I work on is peer-to-peer where one peer might be talking to 100+ others. The peers can be Linux/Windows/Mac (various flavours) and it'll generally be running on PCs usually fairly well speced PCs in an office environment (i.e. 2+ cpus).
Benj
+6  A: 

I/O and non-blocking I/O selection depends from your server activity profile. E.g. if you use long-living connections and thousands of clients I/O may become too expensive because of system resources exhaustion. However, direct I/O that doesn't crowd out CPU cache is faster than non-blocking I/O. There is a good article about that - Writing Java Multithreaded Servers - whats old is new.

About context switch cost - it's rather chip operation. Consider the simple test below:

package com;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.Set;
import java.util.concurrent.ConcurrentSkipListSet;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

public class AAA {

    private static final long DURATION = TimeUnit.NANOSECONDS.convert(30, TimeUnit.SECONDS);
    private static final int THREADS_NUMBER = 2;
    private static final ThreadLocal<AtomicLong> COUNTER = new ThreadLocal<AtomicLong>() {
        @Override
        protected AtomicLong initialValue() {
            return new AtomicLong();
        }
    };
    private static final ThreadLocal<AtomicLong> DUMMY_DATA = new ThreadLocal<AtomicLong>() {
        @Override
        protected AtomicLong initialValue() {
            return new AtomicLong();
        }
    };
    private static final AtomicLong DUMMY_COUNTER = new AtomicLong();
    private static final AtomicLong END_TIME = new AtomicLong(System.nanoTime() + DURATION);

    private static final List<ThreadLocal<CharSequence>> DUMMY_SOURCE = new ArrayList<ThreadLocal<CharSequence>>();
    static {
        for (int i = 0; i < 40; ++i) {
            DUMMY_SOURCE.add(new ThreadLocal<CharSequence>());
        }
    }

    private static final Set<Long> COUNTERS = new ConcurrentSkipListSet<Long>();

    public static void main(String[] args) throws Exception {
        final CountDownLatch startLatch = new CountDownLatch(THREADS_NUMBER);
        final CountDownLatch endLatch = new CountDownLatch(THREADS_NUMBER);

        for (int i = 0; i < THREADS_NUMBER; i++) {
            new Thread() {
                @Override
                public void run() {
                    initDummyData();
                    startLatch.countDown();
                    try {
                        startLatch.await();
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                    while (System.nanoTime() < END_TIME.get()) {
                        doJob();
                    }
                    COUNTERS.add(COUNTER.get().get());
                    DUMMY_COUNTER.addAndGet(DUMMY_DATA.get().get());
                    endLatch.countDown();
                }
            }.start();
        }
        startLatch.await();
        END_TIME.set(System.nanoTime() + DURATION);

        endLatch.await();
        printStatistics();
    }

    private static void initDummyData() {
        for (ThreadLocal<CharSequence> threadLocal : DUMMY_SOURCE) {
            threadLocal.set(getRandomString());
        }
    }

    private static CharSequence getRandomString() {
        StringBuilder result = new StringBuilder();
        Random random = new Random();
        for (int i = 0; i < 127; ++i) {
            result.append((char)random.nextInt(0xFF));
        }
        return result;
    }

    private static void doJob() {
        Random random = new Random();
        for (ThreadLocal<CharSequence> threadLocal : DUMMY_SOURCE) {
            for (int i = 0; i < threadLocal.get().length(); ++i) {
                DUMMY_DATA.get().addAndGet(threadLocal.get().charAt(i) << random.nextInt(31));
            }
        }
        COUNTER.get().incrementAndGet();
    }

    private static void printStatistics() {
        long total = 0L;
        for (Long counter : COUNTERS) {
            total += counter;
        }
        System.out.printf("Total iterations number: %d, dummy data: %d, distribution:%n", total, DUMMY_COUNTER.get());
        for (Long counter : COUNTERS) {
            System.out.printf("%f%%%n", counter * 100d / total);
        }
    }
}

I made four tests for two and ten thread scenarios and it shows performance loss is about 2.5% (78626 iterations for two threads and 76754 for ten threads), System resources are used by the threads approximately equally.

Also 'java.util.concurrent' authors suppose context switch time to be about 2000-4000 CPU cycles:

public class Exchanger<V> {
   ...
   private static final int NCPU = Runtime.getRuntime().availableProcessors();
   ....
   /**
    * The number of times to spin (doing nothing except polling a
    * memory location) before blocking or giving up while waiting to
    * be fulfilled.  Should be zero on uniprocessors.  On
    * multiprocessors, this value should be large enough so that two
    * threads exchanging items as fast as possible block only when
    * one of them is stalled (due to GC or preemption), but not much
    * longer, to avoid wasting CPU resources.  Seen differently, this
    * value is a little over half the number of cycles of an average
    * context switch time on most systems.  The value here is
    * approximately the average of those across a range of tested
    * systems.
    */
   private static final int SPINS = (NCPU == 1) ? 0 : 2000;
denis.zhdanov
Thanks very much, nice to have a test case.
Benj
Thanks for posting the link to "Writing Java Multithreaded Servers - whats old is new". I had forgotten its name and could not find it.
Adam Paynter
A: 

For 100 connections are are unlikely to have a problem with blocking IO and using two threads per connection (one for read and write) That's the simplest model IMHO.

However you may find using JMS is a better way to manage your connections. If you use something like ActiveMQ you can consolidate all your connections.

Peter Lawrey