views:

235

answers:

6

I have two debian boxes connected by a CX4 cable going between two 10 GbE cards. One is going to be generating data very quickly (between 4Gbits/s and 16Gbits/s), and the other needs to be able to grab all of that and store it in RAM for later parsing. I'm new to this kind of low-level coding, and would happily accept any ideas about what broad approach to use (do I need DMA? RDMA?), or tips and tricks that might apply. Thanks!

+3  A: 

The only nics I've heard of available for ordinary PCs that'll handle pulling a saturated 10GbE up to userspace for any kind of post processing are the ones manufactured by Napatech - you'll have to use their custom API.

And you better put such a card a pretty grown up server with the bus plumbing to support that kind of speed(I'd certainly steer away from any kind of nvidia chipsets for such a box.)

leeeroy
yep that stuff costs an arm and a t*st***ckle :D
Hassan Syed
+2  A: 

If you want to constantly process 1 GB of traffic a second you need a very wide buss and a very fast processing rate, and my experience comes from NIDS. You need specialized hardware to consistantly perform NIDS processing 100MB (1 Gig ethernet) of data (10 Gb is another universe). Ram will not help you because you can fill a GB in 5-10 seconds and 1 GB holds a lot of requests.

If you are trying to do any form of business or web processing with 10 gig, you probably need to put a load distributer that can keep up with 10GB of traffic at the front.

p.s., I must clarify that NIDS is 1:1 traffic processed on the machine that sees the traffic -- i.e, worst case scenario you process every byte on the same machine; whereas business/web processing is 1:many: many machines and an order of magnitude many bytes to process.

-- edit --

Now that you have mentioned that there is a gap between data delivery (no standard 10Gb nic can keep up with 10Gb anyway), we need to know what the content of the processing is before we can make suggestion.

-- edit 2 --

Berkeley DB (a database with a simple data model) behaves like a enterprise database (in terms of transaction rate) when you use multiple threads. If you want to write to disk at high rates you should probably explore this solution. You probably want a raid setup to boost throughput -- raid 0+1 is best in terms of IO throughput and protection.

Hassan Syed
I should have been a little more clear. The data will only be coming in at that rate for 5-10 seconds, so with a bunch of RAM (which I have), it shouldn't be a problem to capture it and then take my time doing the postprocessing after the fact -- at least, that's what I'm hoping.
mindloss
off the shelves nics won't be able to process even short bursts of data at that rate.
nos
I was being optimistic even for 1Gb/ps case that I mentioned =D
Hassan Syed
There really won't be much processing... I'll more or less want to dump the data straight to disk as is. I knew there was no way a hard disk could keep up, so hoped I could use the RAM as a buffer, and use layer 2 packets and DMA to try and get it up to speed. All the data will be coming from the one box that it's directly connected to, so there won't be any other overhead to worry about. Definitely impossible with off the shelf 10gig NICs?
mindloss
well according to your scenario you would certainly get burst rates much higher than 1 Gb ethernet. However your sustained rate will not be nearly as high. There is only one way to find out though :D And yes if you put some simple routines in the kernel you will benefit from it.
Hassan Syed
+1  A: 

Well, you're going to need money. One way might be to buy a load-sharing switch to split incoming data into two computers and post-process them into a single database.

Paul Nathan
A: 

Because you have some aspects that simplify the situation (steady point to point between only two machines, no processing) I would actually try to trivial or obvious method of a single TCP stream between the systems and writing the data using write() to disk. Then measure the performance, and profile to determine where any bottlenecks are.

For starting point, read about the C10K (10000 simultaneous connections) problem, which is what most high performance servers are developed for. It should give you a strong background of high performance server issues. Of course you don't need to worry about select / poll / epoll for establishing new connections, which is a major simplification.

mctylr
+1  A: 

Before you plan on any special programming, you should do some testing to see how much you can process with a vanilla system. Set up a mock data file and sending process on the producer machine and a simple accepter/parser on the consumer machine and do a bunch of profiling - where are you going to run into data problems? Can you throw better hardware at it, or can you tweak your processing to be faster?

Be sure you are starting with a HW platform that can support the data rates you are expecting? If you're working with something like Intel's 82598EB NIC, make sure you've got it plugged into a PCIe 2.0 slot, preferably in a x16 slot, in order to get full bandwidth from the NIC to the chipset.

There are ways to tune the NIC driver's parameters to your datastream to get the most out of your setup. For example, be sure you are using jumbo frames on the link in order to minimize the TCP overhead. Also, you might play with the driver's interrupt throttle rates to speed the low level handling.

Is the processing for your dataset parallelizable? If you have one task dumping the data into memory, can you set up several more tasks to processes chunks of the data simultaneously? This would make good use of multi-core CPUs.

Lastly, if none of this is enough, use the profiling/timing data that you've gathered to find the parts of the system that you can adjust for better performance. Don't just assume you know where you need to tweak: back it up with real data - you may be surprised.

Shannon Nelson
A: 

I think the recent linux kernel has supported 10Gb packet from nic->kernel but I doubt that there is effiecent way to copy the data to user space even play with i7/XEON 5500 platform

wirelesser