tags:

views:

155

answers:

3

I'm writing a command line program in ANSI C to parse a Quake 2 map file to report how many entities and textures are being used. My development machine is MacBook. I'm testing on OS X Snow Leopard (32-bit), Windows XP (32-bit) and Vista (64-bit), and Ubuntu 9.10 (32-bit).

I had a crashed bug on Vista where the program would hanged with a certain map file. Took a while to figure out that it wasn't the program but the map file itself. I didn't noticed anything unusual about the text file. Re-opening and saving the map file fixed that issue.

My code loads the entire map file into memory, uses strtok() to separate the lines using '\n', parses each line, and loads the data into a single-link list for processing. Is there a way to detect if the map (text) file is corrupt?

The easiest non-programming solution is to add a FAQ file with the problem and solution.

+2  A: 

As you read each line parse it, to determine whether it is valid or not. If your method fails, you can simply let the user know that the data is corrupt, yet you still have a graceful exit.

karbon
I'm not certain where the crash is happening. The way it behave was similar to a previous crash bug on Vista (64-bit) where I was using "unsigned int" in the data structure to count the number of references (i.e, 38564). I changed that to "unsigned long int" to fix the bug. Unless the file is corrupt in such a way that lines are being parsed and counted to exceed a long int. Hmm...
C.D. Reimer
Are you sure you're getting an overflow? Might it be another issue? Such as the object being incompatible with what <object> you're creating from the values you have.
karbon
A: 

With parser generator tools, you can detect syntactical errors easily.

However, even if the syntax is ok, you should always assume that the contents might not be ok.

For example, if the file format is as follows:

  • n : number of entries
  • entry 1
  • entry 2
  • ...
  • end condition

your code should not just allocate n sized array and read the entries into the array until the end condition. Instead, you should verify that n entries were actually read (and in this case, never read more than n entries to avoid overflow).

Thus, design the code so that it does not blindly trust the input.

Anssi
A: 

I think I fixed the bug. I took a number of steps to get there and testing went fine.

  • Added -Wconversion to my debug mix for GCC. This reported some useful warnings and not so useful warnings. For the most part, adding unsigned to the variable types and a few minor (int) cast.
  • While my data structures had the correct types (i.e., unsigned long int), the output variables that added everything together were the wrong types (i.e., int). Re-checked all my variable types to make sure they all matched.
  • Added a check if the file had zero or negative byte size to halt the program with an error.
  • Added a check if the data lists had zero nodes (i.e., parsing return no valid match) to halt the program with a message that file has no usable data.

I left the parsing functions alone for now. If a corrupt or mangled map file has a valid match, that "data" will eventually be outputted. Garbage In/Garbage Out (GIGO) is still a factor. Something to revisit later. The released version of my program can be found here.

C.D. Reimer