views:

10073

answers:

15

Is there a clean, preferably standard method of trimming leading and trailing whitespace from a string in C? I'd roll my own, but I would think this is a common problem with an equally common solution.

A: 

Its trivial with a regex library, so how "pure" C are we talking?

Greg Rogers
That'd work great, but adding a regex library would be a bit overkill, especially when one isn't already included in the project in question.
coledot
+10  A: 

If you can modify the string:

char *trimwhitespace(char *str)
{
  char *end;

  // Trim leading space
  while(isspace(*str)) str++;

  if(*str == 0)  // All spaces?
    return str;

  // Trim trailing space
  end = str + strlen(str) - 1;
  while(end > str && isspace(*end)) end--;

  // Write new null terminator
  *(end+1) = 0;

  return str;
}

If you can't modify the string, then you can use basically the same method:

size_t trimwhitespace(char *out, size_t len, const char *str)
{
  if(len == 0)
    return 0;

  const char *end;
  size_t out_size;

  // Trim leading space
  while(isspace(*str)) str++;

  if(*str == 0)  // All spaces?
  {
    *out = 0;
    return 1;
  }

  // Trim trailing space
  end = str + strlen(str) - 1;
  while(end > str && isspace(*end)) end--;
  end++;

  // Set output size to minimum of trimmed string length and buffer size minus 1
  out_size = (end - str) < len-1 ? (end - str) : len-1;

  // Copy trimmed string and add null terminator
  memcpy(out, str, out_size);
  out[out_size] = 0;

  return out_size;
}
Adam Rosenfield
Sorry for the bad wording; I've edited it to add a solution for when you can't modify the string in-place
Adam Rosenfield
You should mention that you have to keep a copy of the original pointer in the first example when the string is malloc'ed, or you will never be able to free it again.
jkramer
Sorry, the first answer isn't good at all unless you don't care about memory leaks. You now have two overlapping strings (the original, which has it's trailing spaces trimmed, and the new one). Only the original string can be freed, but if you do, the second one points to freed memory.
David Nehme
@Adam: Please modify the first function. Its not freeing the memory!
N 1.1
@nvl: There is no memory being allocated, so there is no memory to free.
Adam Rosenfield
@Adam: in the first method, you change the pointer to string(if it has leading white space). So, if the string had been `malloc` ed in the main function, it cannot be `free` d now. So, there is a chance of memory leak.
N 1.1
@nvl: No. `str` is a local variable, and changing it does not change the original pointer being passed in. Function calls in C are always pass-by-value, never pass-by-reference.
Adam Rosenfield
@Adam: yes, actually. sorry for bugging. i didn't realize that you were just changing `str` not `*str`. Apologies.
N 1.1
@Adam: i would also have done it in a similar way. anyways thanks :)
N 1.1
if str len is zero, end is str-1 and standard does not give any grantee that it'll be valid pointer.http://c-faq.com/aryptr/non0based.html
Nyan
@Nyan: whoops, good catch! That's fixed now, thanks.
Adam Rosenfield
A: 

Personally, I'd roll my own. You can use strtok, but you need to take care with doing so (particularly if you're removing leading characters) that you know what memory is what.

Getting rid of trailing spaces is easy, and pretty safe, as you can just put a 0 in over the top of the last space, counting back from the end. Getting rid of leading spaces means moving things around. If you want to do it in place (probably sensible) you can just keep shifting everything back one character until there's no leading space. Or, to be more efficient, you could find the index of the first non-space character, and shift everything back by that number. Or, you could just use a pointer to the first non-space character (but then you need to be careful in the same way as you do with strtok).

Ben
strtok is generally not a very good tool to use - not least because it is not re-entrant. If you stay inside a single function, it can be used safely, but if there's any possibility of threads or calling other functions which might themselves use strtok, you are in trouble.
Jonathan Leffler
A: 
char *trim(const char *s)
{
    char *buf = strrev(strdup(s + strspn(s, " ")));
    char *result = strrev(strdup(buf + strspn(buf, " ")));
    free(buf);
    return result;
}
finnw
While this may work, it involves unnecessary memory allocation, and it also uses the non-ANSI function strrev().
Adam Rosenfield
This is definitely a complex solution to a simple problem.
Jonathan Leffler
A: 

Here is a function to do what you want. It should take care of degenerate cases where the string is all whitespace. You must pass in an output buffer and the length of the buffer, which means that you have to pass in a buffer that you allocate.

void str_trim(char *output, const char *text, int32 max_len)
{
    int32 i, j, length;
    length = strlen(text);

    if (max_len < 0) {
        max_len = length + 1;
    }

    for (i=0; i<length; i++) {
        if ( (text[i] != ' ') && (text[i] != '\t') && (text[i] != '\n') && (text[i] != '\r')) {
            break;
        }
    }

    if (i == length) {
        // handle lines that are all whitespace
        output[0] = 0;
        return;
    }

    for (j=length-1; j>=0; j--) {
        if ( (text[j] != ' ') && (text[j] != '\t') && (text[j] != '\n') && (text[j] != '\r')) {
            break;
        }
    }

    length = j + 1 - i;
    strncpy(output, text + i, length);
    output[length] = 0;
}

The if statements in the loops can probably be replaced with isspace(text[i]) or isspace(text[j]) to make the lines a little easier to read. I think that I had them set this way because there were some characters that I didn't want to test for, but it looks like I'm covering all whitespace now :-)

Mark
The maxlen < 0 test leads to dangerous behaviour.
Jonathan Leffler
hmm...good point. I'll have to fix my code. Thanks for noting that.
Mark
A: 

I'm not sure what you consider "painless."

C strings are pretty painful. We can find the first non-whitespace character position trivially:

while (isspace(* p)) p++;

We can find the last non-whitespace character position with two similar trivial moves:

while (* q) q++;
do { q--; } while (isspace(* q));

(I have spared you the pain of using the * and ++ operators at the same time.)

The question now is what do you do with this? The datatype at hand isn't really a big robust abstract String that is easy to think about, but instead really barely any more than an array of storage bytes. Lacking a robust data type, it is impossible to write a function that will do the same as PHperytonby's chomp function. What would such a function in C return?

jfm3
+7  A: 

Here's one that shifts the string into the first position of your buffer. You might want this behavior so that if you dynamically allocated the string, you can still free it on the same pointer that trim() returns:

char *trim(char *str)
{
    size_t len = 0;
    char *frontp = str - 1;
    char *endp = NULL;

    if( str == NULL )
            return NULL;

    if( str[0] == '\0' )
            return str;

    len = strlen(str);
    endp = str + len;

    /* Move the front and back pointers to address
     * the first non-whitespace characters from
     * each end.
     */
    while( isspace(*(++frontp)) );
    while( isspace(*(--endp)) && endp != frontp );

    if( str + len - 1 != endp )
            *(endp + 1) = '\0';
    else if( frontp != str &&  endp == frontp )
            *str = '\0';

    /* Shift the string so that it starts at str so
     * that if it's dynamically allocated, we can
     * still free it on the returned pointer.  Note
     * the reuse of endp to mean the front of the
     * string buffer now.
     */
    endp = str;
    if( frontp != str )
    {
            while( *frontp ) *endp++ = *frontp++;
            *endp = '\0';
    }


    return str;
}

I even tested it for correctness:

int main(int argc, char *argv[])
{
    char *sample_strings[] =
    {
            "nothing to trim",
            "    trim the front",
            "trim the back     ",
            " trim one char front and back ",
            " trim one char front",
            "trim one char back ",
            "                   ",
            " ",
            "a",
            "",
            NULL
    };
    char test_buffer[64];
    int index;

    for( index = 0; sample_strings[index] != NULL; ++index )
    {
            strcpy( test_buffer, sample_strings[index] );
            printf("[%s] -> [%s]\n", sample_strings[index],
                                     trim(test_buffer));
    }

    /* The test prints the following:
    [nothing to trim] -> [nothing to trim]
    [    trim the front] -> [trim the front]
    [trim the back     ] -> [trim the back]
    [ trim one char front and back ] -> [trim one char front and back]
    [ trim one char front] -> [trim one char front]
    [trim one char back ] -> [trim one char back]
    [                   ] -> []
    [ ] -> []
    [a] -> [a]
    [] -> []
    */

    return 0;
}

Source file was trim.c. Compiled with 'cc trim.c -o trim'.

indiv
A: 

I'm only including code because the code posted so far seems suboptimal (and I don't have the rep to comment yet.)

void inplace_trim(char* s)
{
    int start, end = strlen(s);
    for (start = 0; s[start] && isspace(s[start]); ++start) {}
    if (s[start]) {
        while (end > 0 && isspace(s[end-1]))
            --end;
    }
    memmove(s, &s[start], end - start);
    s[end - start] = '\0';
}

char* copy_trim(const char* s)
{
    int start, end;
    for (start = 0; s[start] && isspace(s[start]); ++start) {}
    if (s[start] == '\0') return strdup("");
    for (end = strlen(s); end > 0 && isspace(s[end-1]); --end) {}
    return strndup(s + start, end - start);
}

strndup() is a GNU extension. If you don't have it or something equivalent, roll your own. For example:

r = strdup(s + start);
r[end-start] = '\0';
sfink
A: 

Update: As @Mark Ransom noted in comments - this breaks when whitespace occurs inside the string. Me bad. Sorry.

Using strspn and strcspn (shamelessly borrowing from @Adam Rosenfield, and assuming that you know what "whitespace" is)

const char *WHITESPACE=" \t\n\r";

char *trimwhitespace(char *str)
{
  int spacesAtStart = strspn(str, WHITESPACE);
  char *result = str + spacesAtStart;
  int lengthOfNonSpace = strcspn(result, WHITESPACE);
  result[lengthOfNonSpace] = 0;
  return result;
}
Arkadiy
This doesn't work if there is whitespace in the middle of the string. I almost submitted a similar solution myself before realizing the error.
Mark Ransom
oops. I'll try to downvote it.
Arkadiy
+1  A: 

My solution. String must be changeable. The advantage above some of the other solutions that it moves the non-space part to the beginning so you can keep using the old pointer, in case you have to free() it later.

void trim(char * s) {
    char * p = s;
    int l = strlen(p);

    while(isspace(p[l - 1])) p[--l] = 0;
    while(* p && isspace(* p)) ++p, --l;

    memmove(s, p, l + 1);
}

This version creates a copy of the string with strndup() instead of editing it in place. strndup() requires _GNU_SOURCE, so maybe you need to make your own strndup() with malloc() and strncpy().

char * trim(char * s) {
    int l = strlen(s);

    while(isspace(s[l - 1])) --l;
    while(* s && isspace(* s)) ++s, --l;

    return strndup(s, l);
}
jkramer
A: 
James Antill
A: 
#include "stdafx.h"
#include "malloc.h"
#include "string.h"

int main(int argc, char* argv[])
{

  char *ptr = (char*)malloc(sizeof(char)*30);
  strcpy(ptr,"            Hel  lo    wo           rl   d G    eo rocks!!!    by shahil    sucks b i          g       tim           e");

  int i = 0, j = 0;

  while(ptr[j]!='\0')
  {

      if(ptr[j] == ' ' )
      {
          j++;
          ptr[i] = ptr[j];
      }
      else
      {
          i++;
          j++;
          ptr[i] = ptr[j];
      }
  }


  printf("\noutput-%s\n",ptr);
        return 0;
}
Balkrishna Talele
This made me laugh because I thought dreamlax had edited the test string to include "sucks big time". Nope. The original author is just honest.
James Morris
A: 

A bit late to the game, but I'll throw my routines into the fray. They're probably not the most absolute efficient, but I believe they're correct and they're simple (with rtrim() pushing the complexity envelope):

#include <ctype.h>
#include <string.h>

/*
    Public domain implementations of in-place string trim functions

    Michael Burr
    [email protected]
    2010
*/

char* ltrim(char* s) 
{
    char* newstart = s;

    while (isspace( *newstart)) {
        ++newstart;
    }

    // newstart points to first non-whitespace char (which might be '\0')
    memmove( s, newstart, strlen( newstart) + 1); // don't forget to move the '\0' terminator

    return s;
}


char* rtrim( char* s)
{
    char* end = s + strlen( s);

    // find the last non-whitespace character
    while ((end != s) && isspace( *(end-1))) {
            --end;
    }

    // at this point either (end == s) and s is either empty or all whitespace
    //      so it needs to be made empty, or
    //      end points just past the last non-whitespace character (it might point
    //      at the '\0' terminator, in which case there's no problem writing
    //      another there).    
    *end = '\0';

    return s;
}

char*  trim( char* s)
{
    return rtrim( ltrim( s));
}
Michael Burr
A: 

Here's my C mini library for trimming left, right, both, all, in place and separate, and trimming a set of specified characters (or white space by default).

contents of strlib.h:

#ifndef STRLIB_H_
enum strtrim_mode_t {
    STRLIB_MODE_ALL       = 0, 
    STRLIB_MODE_RIGHT     = 0x01, 
    STRLIB_MODE_LEFT      = 0x02, 
    STRLIB_MODE_BOTH      = 0x03
};

char *strcpytrim(char *d, // destination
                 char *s, // source
                 int mode,
                 char *delim
                 );

char *strtriml(char *d, char *s);
char *strtrimr(char *d, char *s);
char *strtrim(char *d, char *s); 
char *strkill(char *d, char *s);

char *triml(char *s);
char *trimr(char *s);
char *trim(char *s);
char *kill(char *s);
#endif

contents of strlib.c:

#include <strlib.h>

char *strcpytrim(char *d, // destination
                 char *s, // source
                 int mode,
                 char *delim
                 ) {
    char *o = d; // save orig
    char *e = 0; // end space ptr.
    char dtab[256] = {0};
    if (!s || !d) return 0;

    if (!delim) delim = " \t\n\f";
    while (*delim) 
        dtab[*delim++] = 1;

    while ( (*d = *s++) != 0 ) { 
        if (!dtab[*d]) { // Not a match char
            e = 0;       // Reset end pointer
        } else {
            if (!e) e = d;  // Found first match.

            if ( mode == STRLIB_MODE_ALL || ((mode != STRLIB_MODE_RIGHT) && (d == o)) ) 
                continue;
        }
        d++;
    }
    if (mode != STRLIB_MODE_LEFT && e) { // for everything but trim_left, delete trailing matches.
        *e = 0;
    }
    return o;
}

// perhaps these could be inlined in strlib.h
char *strtriml(char *d, char *s) { return strcpytrim(d, s, STRLIB_MODE_LEFT, 0); }
char *strtrimr(char *d, char *s) { return strcpytrim(d, s, STRLIB_MODE_RIGHT, 0); }
char *strtrim(char *d, char *s) { return strcpytrim(d, s, STRLIB_MODE_BOTH, 0); }
char *strkill(char *d, char *s) { return strcpytrim(d, s, STRLIB_MODE_ALL, 0); }

char *triml(char *s) { return strcpytrim(s, s, STRLIB_MODE_LEFT, 0); }
char *trimr(char *s) { return strcpytrim(s, s, STRLIB_MODE_RIGHT, 0); }
char *trim(char *s) { return strcpytrim(s, s, STRLIB_MODE_BOTH, 0); }
char *kill(char *s) { return strcpytrim(s, s, STRLIB_MODE_ALL, 0); }

The one main routine does it all. It trims in place if src == dst, otherwise, it works like the strcpy routines. It trims a set of characters specified in the string delim, or white space if null. It trims left, right, both, and all (like tr). There is not much to it, and it iterates over the string only once. Some folks might complain that trim right starts on the left, however, no strlen is needed which starts on the left anyway. (One way or another you have to get to the end of the string for right trims, so you might as well do the work as you go.) There may be arguments to be made about pipelining and cache sizes and such -- who knows. Since the solution works from left to right and iterates only once, it can be expanded to work on streams as well. Limitations: it does not work on unicode strings.

Shoots the Moon
A: 

Here is my attempt at a simple, yet correct in-place trim function.

void trim(char *str)
{
    int i;
    int begin = 0;
    int end = strlen(str) - 1;

    while (isspace(str[begin]))
        begin++;

    while (isspace(str[end]) && (end >= begin))
        end--;

    // Shift all characters back to the start of the string array.
    for (i = begin; i <= end; i++)
        str[i - begin] = str[i];

    str[i - begin] = '\0'; // Null terminate string.
}
Swiss