ansaurus

Question

Reading from a socket 1 byte a time vs reading in large chunk

Answer 1

+1 A:

First and simplest:

cin.getline(buffer,1024);

Second, usually all IO is buffered so you don't need to worry too much

Third, CGI process start usually costs much more then input processing (unless it is huge file)... So you may just not think about it.

Artyom 2009-05-31 09:26:27

Answer 2

+1 A:

G'day,

One of the big performance hits by doing it one byte at a time is that your context is going from user time into system time over and over. And over. Not efficient at all.

Grabbing one big chunk, typically up to an MTU size, is measurably more efficient.

Why not scan the content into a vector and iterate over that looking out for \n's to separate your input into lines of web input?

HTH

cheers,

Rob Wells 2009-05-31 09:26:56

Yes, depending on the number of calls, the relative overhead caused by function calls may actually become significant at some point.

none 2009-05-31 11:40:26

Answer 3

+4 A:

I can't comment on C++, but from other platforms - yes, this can make a big difference; particularly in the amount of switches the code needs to do, and the number of times it needs to worry about the async nature of streams etc.

But the real test is, of course, to profile it. Why not write a basic app that churns through an arbitrary file using both approaches, and test it for some typical files... the effect is usually startling, if the code is IO bound. If the files are small and most of your app runtime is spent processing the data once it is in memory, you aren't likely to notice any difference.

Marc Gravell 2009-05-31 09:28:26

Answer 4

+1 A:

You are not reading one byte at a time from a socket, you are reading one byte at a atime from the C/C++ I/O system, which if you are using CGI will have alreadety buffered up all the input from the socket. The whole point of buffered I/O is to make the data available to the programmer in a way that is convenient for them to process, so if you want to process one byte at a time, go ahead.

Edit: On reflection, it is not clear from your question if you are implementing CGI or just using it. You could clarify this by posting a code snippet which indicates how you currently read read that single byte.

If you are reading the socket directly, then you should simply read the entire response to the GET into a buffer and then process it. This has numerous advantages, including performance and ease of coding.

If you are linitted to a small buffer, then use classic buffering algorithms like:

getbyte:
   if buffer is empty
      fill buffer
      set buffer pointer to start of buffer
   end
   get byte at buffer pointer
   increment pointer

anon 2009-05-31 09:29:46

Nope. I'm reading from a socket. I'm making HTTP GET request to the web server and reads the response from a socket. I do this because I need the completely rendered and parsed dynamic content.

teriz 2009-05-31 09:40:36

Think I could settle with this algorithm with a little modification. I can have two fixed size buffer. One to read an entire (say 512 bytes), scan it and store a single complete html line on another buffer which I could access easily in my other parsing methods. I could have a more efficient socket reading routine and I could keep the ease of processing I have right now (i.e my other methods assuming one complete html line). Thanks. =)

teriz 2009-05-31 10:22:03

Answer 5

+3 A:

If you are reading directly from the socket, and not from an intermediate higher-level representation that can be buffered, then without any possible doubt, it is just better to read completely the 1024 bytes, put them in RAM in a buffer, and then parse the data from the RAM.

Why? Reading on a socket is a system call, and it causes a context switch on each read, which is expensive. Read more about it: IBM Tech Lib: Boost socket performances

NicDumZ 2009-05-31 10:02:54

+1 - I like your argument on why reading in large chunk is better performance-wise. I think I can settle for Neil Butterworth's answer for resolving my second concern. =)

teriz 2009-05-31 10:30:18

Answer 6

A:

There is no difference at the operating system level, data are buffered anyway. Your application, however, must execute more code to "read" bytes one at a time.

2009-05-31 12:11:37

Answer 7

+1 A:

You can open the socket file descritpor with the fdopen() function. Then you have buffered IO so you can call fgets() or similar on that descriptor.

codymanix 2009-05-31 15:33:52

-1 for suggesting gets().

bk1e 2009-05-31 21:03:03

sorry, I meant fgets(), edited my answer now :-(

codymanix 2009-06-06 22:33:59

How could you!!

LukeN 2010-05-27 17:15:56

ansaurus

tags:

views:

answers:

Reading from a socket 1 byte a time vs reading in large chunk

related questions