views:

68

answers:

2

Hi -

I am building a real-time embedded linux application that has a variety of network traffic. Of the set of traffic, two connections are time critical. One is the input data and the other for output data. My application needs this traffic to have priority over the other, non-time-critical traffic.

I care about two things:

  1. Minimize the number of dropped packets due to overload on these two connections.
  2. Minimize the latency through the device (input to output) on these two connnections.

I've come (somewhat!) up to speed on Linux traffic control, and understand that it primarly applies to egress traffic, as the remote device is responsible for the priority of data it sends to me. I have setup my application as a real time process and have worked through the issues related to what priority to run it at.

I now embark on setting up tc. For my test case, here is what I use:

tc qdisc add dev eth0 root handle 1: prio bands 3 priomap 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 2
tc qdisc add dev eth0 parent 1:1 handle 10: pfifo
tc qdisc add dev eth0 parent 1:2 handle 20: pfifo
tc qdisc add dev eth0 parent 1:3 handle 30: pfifo

Basically I am saying: Send all priority 7 traffic over band 0, and all other traffic over band 2. Once I have this simple test working I will do a better job handling other traffic.

First let's verify my expectations: What I expect is that any traffic having priority 7 should always go out before traffic having any other priority. This should make the latency on such traffic be relatively unaffected by other traffic on the box, no? My mtu is set to 1500, and I am getting about 10 MB/sec through the interface. The maximum additional latency on band 0 caused by band 2 traffic is one packet (<=1500 bytes), or 150 us (1500 bytes / 10 MBytes/sec = 150 us).

Here is my test setup:

Two Linux Boxes. Box 1 running a TCP server that echos input data. Box 2 connects to box one, sends packets over TCP and measures the latency (time sent to time received).

I use the same tc setup for box Linux boxes.

In the applications (both server and client), I set the SO_PRIORITY on the socket as follows:

int so_priority = 7;
setsockopt(m_socket.native(), SOL_SOCKET, SO_PRIORITY, &so_priority, sizeof(so_priority));

I use tc to verify that my traffic goes over band 0, and all other traffic over band 2:

tc -s qdisc ls dev eth0

Here's the rub: When there is no other traffic, I see latencies in the 500 us range. When I have other traffic (for example, a scp job copying a 100 MB file), the latencies jump up to 10+ ms. What is really strange is that NONE of the tc work I did has any affect. In fact, if I swap the bands (so all my traffic goes over the lower priority band 2, and other traffic over band 1), I don't see any difference in latency.

What I was expecting is that when there is other traffic on the network, I would see an increase in latency of about 150 us, not 10 ms! By the way, I have verified that loading the box with other (non-real time priority) processes does not affect latency, nor does traffic on other interfaces.

One other item of note is that if I drop the mtu to 500 bytes, the latency decreases to about 5 ms. Still, this is an order of magnitude worse than the unloaded case. Also--why does changing the mtu affect it so much, but using tc to setup priority queuing has no effect???

Why is tc not helping me? What am I missing?

Thanks!

Eric

A: 

You didn't say anything about the rest of your network, but I'm guessing you're hitting a queue at an upstream router, which usually have long queues to optimize for throughput. The best way to fix it is to feed your priority queue into a shaper with a bandwidth just under your upstream bandwidth. That way your bulk-priority packets will queue up inside your box instead of at an external router, allowing your high-priority packets to jump to the front of the queue as you expect.

Karl Bielefeldt
Sounds right on. So, I removed all switches between the two boxes (the ethernet port has auto-crossover support), and reran the test--same results. Could this be related to using TCP?
Eric
Nope. Reran with UDP and am seeing the same problem. Aargh!!
Eric
Hmm. Maybe try this suggestion from the bugs section on the man page: Large amounts of traffic in the lower bands can cause starvation of higher bands. Can be prevented by attaching a shaper (for example, tc-tbf(8) to these bands to make sure they cannot dominate the link.
Karl Bielefeldt
Just had another thought. If your bulk traffic is going to the same host as your interactive traffic, it might be queueing up on ingress. I don't know much about the ingress side, but maybe try setting up the same priority queue on that side?
Karl Bielefeldt
That is a nice (not) comment in the bugs section. Seems like a glaring bug. I set up a token bucket to limit traffic flow. This definitely helped, (latencies came down to ~8 ms max), but definitely did not solve the problem.
Eric
From my understanding, traffic control is on egress only. Ingress can be used for policing, but otherwise, all it can do is drop packets. I have the traffic control settings setup on both boxes. Thus, the ingress side of one box is effectively controlled by the egress of the other box.
Eric