tags:

views:

141

answers:

6

I am writing a fastcgi application for my site in C. Don't ask why, leave all that part.

Just help me with this problem- I want to replace spaces in the query string with %20. Here's the code I'm using, but I don't see 20 in the output, only %. Where's the problem?

Code:

unsigned int i = 0;

/*
 * Replace spaces with its hex %20
 * It will be converted back to space in the actual processing
 * They make the application segfault in strtok_r()
 */

char *qstr = NULL;
for(i = 0; i <= strlen(qry); i++) {
  void *_tmp;
  if(qry[i] == ' ') {
    _tmp = realloc(qstr, (i + 2) * sizeof(char));
    if(!_tmp) error("realloc() failed while allocting string memory (space)\n");
    qstr = (char *) _tmp;
    qstr[i] = '%'; qstr[i + 1] = '2'; qstr[i + 2] = '0';
  } else {
    _tmp = realloc(qstr, (i + 1) * sizeof(char));
    if(!_tmp) error("realloc() failed while allocating string memory (not space)\n");
    qstr = (char *) _tmp;
    qstr[i] = qry[i];
  }
}

In the code, qry is char *, comes as a actual parameter to the function. I tried with i + 3, 4, 5 in realloc() in the space replacer block, no success.

+13  A: 

String-handling in C can be tricky. I'd suggest going through the string first, counting the spaces, and then allocating a new string of the appropriate size (original string size + (number of spaces * 2)). Then, loop through the original string, maintaining a pointer (or index) to the position in both the new string and the original one. (Why two pointers? Because every time you encounter a space, the pointer into the new string will get two characters ahead of the pointer into the old one.)

Here's some code that should do the trick:

int new_string_length = 0;
for (char *c = qry; *c != '\0'; c++) {
    if (*c == ' ') new_string_length += 2;
    new_string_length++;
}
char *qstr = malloc((new_string_length + 1) * sizeof qstr[0]);
char *c1, *c2;
for (c1 = qry, c2 = qstr; *c1 != '\0'; c1++) {
    if (*c1 == ' ') {
        c2[0] = '%';
        c2[1] = '2';
        c2[2] = '0';
        c2 += 3;
    }else{
        *c2 = *c1;
        c2++;
    }
}
*c2 = '\0';
David
Hmm, good solution. Waiting for some others to answer.
Nilesh
If speed is preferred over memory use allocate a string with 3 times the space of the original string and skip searching for the spaces to determine the length of the new string.Like the method found in the link in Kangkan's answer.
Daniel Persson
Good approach but you malloced string is one character too short (you don't count the trailing '\0') so it should better start with `int new_string_length = 1;`.
x4u
@x4u: Oops! Fixed it now. (Although I opted to change the `malloc` call rather than the initial value of `new_string_length`, so that the name of `new_string_length` remained appropriate for its purpose.)
David
@Daniel - why would allocating a block 3 times the space pf the original string be faster? `strlen()` has to iterate over the string anyway - why not just iterate yourself and account for spaces while you're at it as David's solution does? `strlen()` might be a bit faster than iterating yourself, but I'd bet the difference would be hard to measure unless your strings are really, really long.
Michael Burr
@Michael - Sorry my mistake, I had in my head that the length of the string was known already. You are correct :)
Daniel Persson
A: 

This is known as url encode. You can refer to this page to see some similar implementation: http://www.geekhideout.com/urlcode.shtml

Kangkan
Oh no man. I am quite new to C and I would like to learn rather than using some predefined stuff.
Nilesh
+1  A: 

You are assigning using the same counter I you will need to have 2 counters since the strings have different lengths

your else case assigns qstr[i] = qry[i]; after you have written the %20 you are at least off by 2 on the result string.

rerun
+2  A: 

I agree with David.

It is advisable to do it in two-steps: in the first loop you just count the spaces:

int spaceCounter=0;
const int sourceLen = strlen(qry);
for(int i = 0; i < sourceLen; ++i) 
    if ( qry[i] == ' ')
        ++spaceCounter;

char* newString = (char*)malloc(sourceLen + 3*spaceCounter*sizeof(char) + 1)
//check for null!
for(int i = 0; i < sourceLen; ++i) 
    if ( qry[i] == ' ')
    {
        *newString++ = '%';
        *newString++ = '2';
        *newString++ = '0';
    }
    else
        *newString++ = qry[i];

*newString = '\0';

Warning: code not tested.

vulkanino
note that `sizeof(char)` is required by the standard to be 1.
James Curran
Also, you are putting a space into newString, which kind-of defeats the purpose of this function. It should be adding just the three character `%20`
James Curran
@James: While you are correct, I think the possible intent here might be to make it more portable to Unicode.
Steven Sudit
James you're right, forget the space :)I'm editing the answer to correct for this.sizeof(char) can be two with unicode.
vulkanino
sizeof(char) is always 1. sizeof is defined in terms of char. char could be 8 9-bit bytes but sizeof(char) would still be one. Unicode is irrelevant as far as sizeof(char) is concerned.
Logan Capaldo
sizeof(char) cannot ever be 2 in c/c++. It is required by the Standard to be 1. `sizeof(wchar_t)` might be 2.
James Curran
+1 I find your solution a bit cleaner than David's. Only problem is that you don't seem to be making a note of where newString starts before changing it.
JeremyP
@JeremyP: I'm not sure what makes this cleaner than David's - they're actually pretty dang close to the same. However, I'd have a few minor quibbles about this implementation compared with David's: 1) this one gets the string length and the number of spaces in 2 steps, David's does this in one step, 2) this one allocates more space than necessary - one extra byte per space found (which is certainly better than not allocating enough), and 3) this one loses track of the allocated buffer (as you mentioned). All these are easily fixed, but at that point you'd have nearly identical code to David's.
Michael Burr
@Michael Burr: they are the same aside from the bugs. I voted them both up to be fair.
JeremyP
+6  A: 
qstr[i] = '%'; qstr[i + 1] = '2'; qstr[i + 2] = '0'; 

That line writes three characters to your output buffer, so the next character you write needs to be written at qstr[i+3]. However, you only step i by 1, so the next character is written to qstr[i+1], overwriting the '2'.

You will need to keep separate indexes for stepping through qry & qstr.

James Curran
A: 
char* toHexSpace(const char *s)
{
  char *b=strcpy(malloc(3*strlen(s)+1),s),*p;
  while( p=strchr(b,' ') )
  {
    memmove(p+3,p+1,strlen(p));
    strncpy(p,"%20",3);
  }
  return b; 
}

needs "free" in calling context.