views:

22

answers:

2

I'm in great need of a way to dig through potentially huge amounts of CGI supplied POST data.

With reading the GET data it's no big deal, as I can just re-request the QUERY_STRING environment variable as often as I want, but with POST data which is supplied via stdin. I can only read it in once and have to store it somewhere.

My current method consists of reading the whole bunch of POST data inside a temporary file which will be removed when the program exits and scan through it to find the keys I want to fin. In the GET parsing approach I could just strtok() over the QUERY_STRING because GET data has pretty low limits so it's safe to be fetched inside RAM, but the POST data can be anything from empty to "name=Bob" to a 4 Gigabye movie file.

So, here's my current approach:

int get_post_data(const char *s_key, char *target, size_t target_size)
{
   FILE *tmp;
   int ret_val = -1;

   /* postdata_temp = global variable containing the temporary file name */
   if ((tmp = fopen(postdata_tempfile, "r")) == NULL)
      return -1;
   else
   {
      char *buffer = NULL;
      char *temp_buffer = NULL;
      int buffer_size;
      int i;

      if ((buffer = malloc(BUFFER_SIZE)) == NULL)
         return -1;

      memset(buffer, 0, sizeof(BUFFER_SIZE));
      buffer_size = BUFFER_SIZE;

      for (i = 0;; i++)
      {
         int c = fgetc(tmp);

         if ((c == '&') || feof(tmp))
         {
            char *key = strtok(buffer, "=");
            char *val = strtok(NULL, "");            

            if (key)
            {
               if (strcmp(s_key, key) == 0)
               {
                  if (val)
                  {
                     strncpy(target, val, target_size);
                     ret_val = strlen(val);
                  }
                  else
                  {
                     target = NULL;
                     ret_val = 0;
                  }

                  break;
               }
            }

            if (feof(tmp))
               break;

            memset(buffer, 0, buffer_size);
            i = -1; /* because it will be 0 when the fgetc() is called the 
                     * next time */
         }
         else
         {
            if (!(i < buffer_size))
            {
               buffer_size += BUFFER_SIZE;

               if ((temp_buffer = realloc(buffer, buffer_size)) == NULL)
               {
                  free(temp_buffer);
                  free(buffer);
                  target = NULL;

                  return -1;
               }
               else
                  buffer = temp_buffer;
            }

            buffer[i] = c;
         }

      }

      free(buffer);

      // printf("Final buffer size: %d<br />\n", buffer_size);
   }

   fclose(tmp);

   return ret_val;
}

This does work, I can call get_post_data("user_password", pass, sizeof(pass));, check for the return value (<0 = error, =0 = key exists but value is NULL, >0 = data length), but it seems too obese. I mean.. huge IO overhead for every single POST parameter I want to search just to not have the whole string inside my RAM for potentially large files being uploaded?

What does Stackoverflow think?

A: 

I think it would be easier to just reject POST requests larger than a set limit, say 2MB.

That way:

  • You have a manageable-sized block of data to work with.
  • You prevent malicious 4GB POST requests.
George Edison
I try not to limit what my program can accept. In this particular case (text articles) 2 MB is way enough, but I write my modules in a way that enables me to use them in any case, like file uploads where 2 MB is a drop in a bucket. :)
LukeN
A: 

If you want to avoid loading a big file into RAM, you could use a memory mapped file - not portable, but it's the right way to do it. If your platform is POSIX you could use mmap() for this.

By the way I didn't fully read or test your code but I would wonder whether using strtok() is the right thing to do because it destroys the data as it goes. I'd also wonder about using str...() functions if your data is maybe a binary file, but I don't know how the CGI part works so you might be right there.

Matt Curtis
The data arrives as "key=value" and I use `strtok()` to cut apart key and value. Only the value is binary in practice and will be handed unmodified to the caller :)
LukeN
@LukeN: Did any of this help?
Matt Curtis