views:

196

answers:

5

So I have a problem that I can not figure out. I am writing some code in C. I kept winding up with issues where reading from the network would seemly randomly work.

I finally traced it down to the number of strings in the code. I cant believe it but I have verified it pretty in depth.

The code base is rather massive so I am not sure of the overall number of strings parity. However I know that if i add an odd number then the program works, and if i add an even number it doesnt.

Just to clarify when I say it doesnt work, It does still build and execute, but everytime I try to read anything over the network all i get is 0's. When its working I get the correct data.

has anyone ever heard of anything like this? Or have any idea what could be causing this? I could see if the data portion of the program was getting too large and starting to impede on other code's space but the fact that its an odd/even thing completely confuses me.

thanks

EDIT (Adding more info):

The platform is a custom designed device. the code base is redboot but its been altered significantly for the custom device.

snipped for example:

//This will work because its an odd number of strings.

char* str1 = "test";
char* str2 = "test2";
char* str3 = "test3";

int i = strlen(str1) + strlen(str2) + strlen(str3);

......................................

if i were to change the last line to

int i = strlen(str1) + str(len2);

so that str3 gets optimized out by the compiler then the code will no longer work. I have tested this many times with various lenghts of strings all result in the same odd/even behavior. (i is just sent to a debug log so that its not optimized out. nothing fancy is done with it).

Edit2: The above code can be placed anywhere in the codebase and it causes the same problem. It doesnt matter if its been executed or not, which leads me to believe its not a stack overflow.

+4  A: 

I've not heard of a problem like this before. You sound like you're very frustrated and you say that your code base is rather massive. If solving the problem is important, I would suggest that you try to reproduce the issue with a smaller amount of code. It may also help you get answers here if you post some samples of your code to illustrate the question.

Rice Flour Cookies
+6  A: 

Random stab-in-the-dark time...

A common misunderstanding when reading from network sockets is that a read() of 10 bytes will return the next 10 bytes. It won't. It will return UP TO 10 bytes, and you may need to call read() multiple times to get all the data you require.

Roddy
+4  A: 

Where do you take the assertion from that it has to do with the parity of the number of strings? If I try to interpret what you say carefully, this tells me that small changes in code let you trigger unexpected behavior.

Smells like stack overflow. Do you allocate large arrays or strings on the stack and then do read and write to them? In that case try to allocate/deallocate these large buffers dynamically through malloc/free.

Jens Gustedt
i wantet to post the same answer. it is called a Heisenberg-Bug: add more debug code and the error moves to another point in the program.
Peter Miehle
Peter: Not a "Heisenberg-Bug", but a "Heisenbug".
Gabe
+1  A: 

if you have somthing like

char buf[10];
long var;
strcpy(buf, "ganz viel text");

you may or may not get an segmentation violation or strange behaviour with variable "var". if you put more debug text into your code, the linker may reallocate the variables, or the compiler may do other code optimization, and reallocation space allocation in memory.

Peter Miehle
+2  A: 

Here is a guess.

Assume the platform is 32 bit.

Perhaps the compiler aligns some of the data structutres of your program in memory on eight byte boundaries. You have a whole load of string pointers in your data segment and maybe some other stuff too. If there is an odd number of strings, the next thing that needs an eight byte alignment has four bytes of padding in front of it. If there is an even number of strings, there is no padding.

Whatever piece of data that is just before that eight byte aligned object has an overflow bug that just destroys the contents of between one and four bytes after it. If there is padding after this thing, nothing bad happens. If there is no padding, the eight byte aligned object gets zapped.

JeremyP