tags:

views:

279

answers:

4

Hello All,

I am pretty new to socket programming. I have a function call similar to:

len = read(FD, buf, 1500);

[which is responsible for reading data from a telnet connection]

a printf in the next line shows buf to be >300 characters in length but (int)len gives only 89! Because of this all further parsing of the returned string fails..

I read a lot of questions on async socket read returning less than required data but in the above case it is returning sufficient data but the length reported is all wrong [the returned string is always the same and the length is always the same wrong value]...

Also the above function works properly when the returned string is small (typically in the range of 100 characters)

Any pointers would be extremely helpful!

--Ashwin

+1  A: 

Usually for async sockets you have to read until the desired byte count is received. This means you have to manage the buffers. (i.e. increment the buffer pointer according to the bytes received, etc.)

The observations you have about the buffer having correct amount of data but reporting the wrong size is probably caused by stale data from the previous run. To confirm you could clear the buffer before each run.

Indeera
Before every run something like "memset (buf, 0, len)" is done. Also as I mentioned in a clarification below, the data in it is proper (the content would change every time the command is issued)...
"data in it is proper" in the sense that the full 303 byte response is the correct one - 89 byte response is getting truncated in the middle of a word (dictionary word - no special characters...)
A: 

read() attempts to read up to 1500 bytes from file descriptor FD into the buffer starting at buf. On success, the number of bytes read is returned. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file

Generally you need to call a read within a loop.

dfa
The response is only 303 characters long but we request for 1500 bytes since the response will be of variable length. The bigger problem is that we run the read in a loop till we hit a end delimiter string. The logic for that is run a loop (for a length reported by the read() ) and check if the delimiter is preset. If so terminate the read() loop. All that logic gets screwed because the length reported by read() is wrong.
imho it is better to put in the header (2/4 bytes) the content lenght; you first read X bytes and then you request a read in a loop of that lenght
dfa
Thats a great idea - but unfortunately, the node to which I am telnetting is 3rd party code :( - will have to figure out if I can find a way to add length header or finally resort to doing a strlen of the buffer and use it instead of using the return value of read() function... - Any opinions?
there is a terminating char? like '\0' for "strings"?
dfa
+2  A: 

you can zero the buffer at the read offset using :

buf[len] = 0;

The print call should be ok.

Aif
+1  A: 

but in the above case it is returning sufficient data but the length reported is all wrong

No, you are wrong about that. It is returning what it says it's returning, 89 bytes. The problem is that those 89 bytes don't include a nul terminator so that, when you printf the buffer, it keeps going, printing whatever was already in the rest of the buffer before your read happened.

What you should be doing (but see caveat below) is something like:

len = read(FD, buf, 1500);
printf ("%*.*s\n", len, len, buf);

to ensure you don't print beyond the end of the buffer.

What you're seeing is equivalent to:

char buff[500];
strcpy (buff, "Hello there");
memcpy (buff, "Goodbye", 7);
printf ("%s", buff);

Because you're not transferring the nul character in the memcpy, the buffer you're left with is:

               +---+---+---+---+---+---+---+---+---+---+---+---+
After sprintf: | H | e | l | l | o |   | t | h | e | r | e | \0|
               +---+---+---+---+---+---+---+---+---+---+---+---+
After memcpy : | G | o | o | d | b | y | e | h | e | r | e | \0|
               +---+---+---+---+---+---+---+---+---+---+---+---+

giving the string "Goodbyehere".

Caveat:

If there are nul characters within your data stream, that printf won't work since it'll stop at the first nul character it finds. The read function reads binary data from a file descriptor and it doesn't have to stop at the first newline or nul character.

That would be equivalent to:

char buff[500];
strcpy (buff, "Hello there");
memcpy (buff, "Go\0dbye", 8);
printf ("%s", buff);

               +---+---+---+---+---+---+---+---+---+---+---+---+
After sprintf: | H | e | l | l | o |   | t | h | e | r | e | \0|
               +---+---+---+---+---+---+---+---+---+---+---+---+
After memcpy : | G | o | \0| d | b | y | e | \0| e | r | e | \0|
               +---+---+---+---+---+---+---+---+---+---+---+---+

giving the string "Go".

If you really want to process nul- or newline-terminated string on what is a binary channel, the following (pseudo-code) is one way to do it:

while true:
    while buffer has no terminator character:
        read some more data into buffer, break on error or end-of-file.
    break on error or end-of-file.
    while buffer has at least one terminator character:
        process data up to first terminator character.
        remove that section from buffer.

It's a process that reads data until you have at least one "unit of work", then processes those units of work until you don't have a complete unit of work left.

paxdiablo
Sorry for the delayed clarification:The data in the buffer is actually the data returned by the telnet command. I verified this by manually telnetting in to the node and issuing the command in the CLI (whose response is what is being gathered by the read command also). That is what I had mentioned as the response is longer than the length reported. Also at the length mentioned there is a regular alphabet (the next character is also an alphabet) - so not sure if a special character is terminating the length.
The response definitely has newline characters. As I said a printf("%s", buf) is printing the entire response properly. So I guess there are no "/0" in between. But we cannot be sure of that...
@Ashwin, a telnetd daemon would be very unlikely to send a nul character (either in the stream or at the end of each line), it's not really part of the protocol. So you really should be relying on line terminator characters since it's a line-oriented protocol. That means terminating the lines themselves if you want to pass them to printf (or using the length.length "trick" as per my answer). You must use the return code from read() to find the real end of buffer, there's really no other way.
paxdiablo
how about using a strlen on the buf returned if you are sure that /0 characters wont be present in the stream itself? Is that feasible? If /0 is not present, I am just wondering how come printf("%s", buf) is working to the proper length in the first place...