tags:

views:

247

answers:

7

I need to expand tabs in an input line, so that they are spaces (with a width of 8 columns). I tried it with a previous code I had replacing the last space in every line greater than 10 characters with a '\n' to make a new line. Is there an way in C to make tabs 8 spaces in order to expand them? I mean I am sure it is simple, I just can't seem to get it.

Here's my code:

int v = 0;
int w = 0;
int tab;
extern char line[];

while (v < length) {

   if(line[v] == '\t')
      tab = v;

   if (w == MAXCHARS) {
      // THIS IS WHERE I GET STUCK
      line[tab] = ' ';
      // set y to 0, so loop starts over
      w = 0;
   }
   ++v;
   ++w;
}
+4  A: 

This isn't really a question about the C language; it's a question about finding the right algorithm -- you could use that algorithm in any language.

Anyhow, you can't do this at all without reallocating line[] to point at a larger buffer (unless it's a large fixed length, in which case you need to be worried about overflows); as you're expanding the tabs, you need more memory to store the new, larger lines, so character replacement such as you're trying to do simply won't work.

My suggestion: Rather than trying to operate in place (or trying to operate in memory, even) I would suggest writing this as a filter -- reading from stdin and writing to stdout one character at a time; that way you don't need to worry about memory allocation or deallocation or the changing length of line[].

If the context this code is being used in requires it to operate in memory, consider implementing an API similar to realloc(), wherein you return a new pointer; if you don't need to change the length of the string being handled you can simply keep the original region of memory, but if you do need to resize it, the option is available.

Charles Duffy
+2  A: 

I would probably do something like this:

  1. Iterate through the string once, only counting the tabs (and the string length if you don't already know that).
  2. Allocate original_size + 7 * number_of_tabs bytes of memory (where original_size counts the null byte).
  3. Iterate through the string another time, copying every non-tab byte to the new memory and inserting 8 spaces for every tab.

If you want to do the replacement in-place instead of creating a new string, you'll have to make sure that the passed-in pointer points to a location with enough memory to store the new string (which will be longer than the original because 8 spaces or 7 bytes more than one tab).

sepp2k
Inserting 8 spaces for every tab doesn't align text in columns the way tab is expected to work. Step #3 needs to insert spaces until the position in the output line is a multiple of the tab width.
Neil
"Is there an way in C to make tabs 8 spaces in order to expand them?" Kinda sounds like the OP wants a plain substitution "1 tab -> 8 spaces". Not sure though.
sepp2k
+3  A: 

You need a separate buffer to write the output to, since it will in general be longer than the input:

void detab(char* in, char* out, size_t max_len) {
    size_t i = 0;
    while (*in && i < max_len - 1) {
     if (*in == '\t') {
      out[i++] = ' ';
      while (i % 8 && i < max_len - 1) {
       out[i++] = ' ';
      }
     } else {
      out[i++] = *in++;
     }
    }

    out[i] = 0;
}

You must preallocate enough space for out (which in the worst case could be 8 * strlen(in) + 1), and out cannot be the same as in.

EDIT: As suggested by Jonathan Leffler, the max_len parameter now makes sure we avoid buffer overflows. The resulting string will always be null-terminated, even if it is cut short to avoid such an overflow. (I also renamed the function, and changed int to size_t for added correctness :).)

j_random_hacker
It might be a good idea to encourage people to write code that avoids buffer overflows - so pass a size for `out` into the function and make sure you don't get a buffer overflow.
Jonathan Leffler
And also isn't the function removing tabs - so the name would be better as 'detab()' than 'entab()'?
Jonathan Leffler
Both good points Jonathan: code updated.
j_random_hacker
`expand_tabs` would by my choice. =]
strager
I would have gone with `copy_detabifyingly()`, but I was concerned that might conflict with something in the standard library. :)
j_random_hacker
+1  A: 

Untested, but something like this should work:

int v = 0;
int tab;
extern char line[];

while (v < length){
  if (line[v] == '\t') {
    tab = (v % TAB_WIDTH) || TAB_WIDTH;
    /* I'm assuming MAXCHARS is the size of your array. You either need
     * to bail, or resize the array if the expanding the tab would make
     * the string too long. */
    assert((length + tab) < MAXCHARS);
    if (tab != 1) {
      memmove(line + v + tab - 1, line + v, length - v + 1);
    }
    memset(line + v, ' ', tab);
    length += tab - 1;
    v += tab;
  } else {
    ++v;
  }
}

Note that this is O(n*m) where n is the line size and m is the number of tabs. That probably isn't an issue in practice.

Laurence Gonsalves
A: 

There are a myriad ways to convert tabs in a string into 1-8 spaces. There are inefficient ways to do the expansion in-situ, but the easiest way to handle it is to have a function that takes the input string and a separate output buffer that is big enough for an expanded string. If the input is 6 tabs plus an X and a newline (8 characters + terminating null), the output would be 48 blanks, X, and a newline (50 characters + terminating null) - so you might need a much bigger output buffer than input buffer.

#include <stddef.h>
#include <assert.h>

static int detab(const char *str, char *buffer, size_t buflen)
{
    char *end = buffer + buflen;
    char *dst = buffer;
    const char *src = str;
    char c;

    assert(buflen > 0);
    while ((c = *src++) != '\0' && dst < end)
    {
         if (c != '\t')
             *dst++ = c;
         else
         {
             do
             {
                 *dst++ = ' ';
             } while (dst < end && (dst - buffer) % 8 != 0);
         }
    }
    if (dst < end)
    {
        *dst = '\0';
        return(dst - buffer);
    }
    else
        return -1;
}

#ifdef TEST
#include <stdio.h>
#include <string.h>

#ifndef TEST_INPUT_BUFFERSIZE
#define TEST_INPUT_BUFFERSIZE 4096
#endif /* TEST_INPUT_BUFFERSIZE */
#ifndef TEST_OUTPUT_BUFFERSIZE
#define TEST_OUTPUT_BUFFERSIZE (8 * TEST_INPUT_BUFFERSIZE)
#endif /* TEST_OUTPUT_BUFFERSIZE */

int main(void)
{
     char ibuff[TEST_INPUT_BUFFERSIZE];
     char obuff[TEST_OUTPUT_BUFFERSIZE];

     while (fgets(ibuff, sizeof(ibuff), stdin) != 0)
     {
          if (detab(ibuff, obuff, sizeof(obuff)) >= 0)
              fputs(obuff, stdout);
          else
              fprintf(stderr, "Failed to detab input line: <<%.*s>>\n",
                      (int)(strlen(ibuff) - 1), ibuff);
     }
     return(0);
 }
 #endif /* TEST */

The biggest trouble with this test is that it is hard to demonstrate that it handles overflows in the output buffer properly. That's why there are the two '#define' sequences for the buffer sizes - with very large defaults for real work and independently configurable buffer sizes for stress testing. If the source file is dt.c, use a compilation like this:

 make CFLAGS="-DTEST -DTEST_INPUT_BUFFERSIZE=32 -DTEST_OUTPUT_BUFFERSIZE=32" dt

If the 'detab()' function is to be used outside this file, you'd create a header to contain its declaration, and you'd include that header in this code, and the function would not be static, of course.

Jonathan Leffler
+2  A: 

Here's a reentrant, recursive version which automatically allocates a buffer of correct size:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct state
{
    char *dest;
    const char *src;
    size_t tab_size;
    size_t size;
    _Bool expand;
};

static void recexp(struct state *state, size_t di, size_t si)
{
    size_t start = si;
    size_t pos = si;

    for(; state->src[pos]; ++pos)
    {
        if(state->src[pos] == '\n') start = pos + 1;
        else if(state->src[pos] == '\t')
        {
            size_t str_len = pos - si;
            size_t tab_len = state->tab_size - (pos - start) % state->tab_size;

            recexp(state, di + str_len + tab_len, pos + 1);
            if(state->dest)
            {
                memcpy(state->dest + di, state->src + si, str_len);
                memset(state->dest + di + str_len, ' ', tab_len);
            }

            return;
        }
    }

    state->size = di + pos - si + 1;
    if(state->expand && !state->dest) state->dest = malloc(state->size);
    if(state->dest)
    {
        memcpy(state->dest + di, state->src + si, pos - si);
        state->dest[state->size - 1] = 0;
    }
}

size_t expand_tabs(char **dest, const char *src, size_t tab_size)
{
    struct state state = { dest ? *dest : NULL, src, tab_size, 0, dest };
    recexp(&state, 0, 0);
    if(dest) *dest = state.dest;
    return state.size;
}

int main(void)
{
    char *expansion = NULL; // must be `NULL` for automatic allocation
    size_t size = expand_tabs(&expansion,
        "spam\teggs\tfoo\tbar\nfoobar\tquux", 4);
    printf("expanded size: %lu\n", (unsigned long)size);
    puts(expansion);
}

If expand_tabs() is called with dest == NULL, the function will return the size of the expanded string, but no expansion is actually done; if dest != NULL but *dest == NULL, a buffer of correct size will be allocated and must be deallocated by the programmer; if dest != NULL and *dest != NULL, the expanded string will be put into *dest, so make sure the supplied buffer is large enough.

Christoph
A: 

Here is one that will malloc(3) a bigger buffer of exactly the right size and return the expanded string. It does no division or modulus ops. It even comes with a test driver. Safe with -Wall -Wno-parentheses if using gcc.

#include <stddef.h>
#include <stdlib.h>
#include <string.h>

static char *expand_tabs(const char *s) {
  int i, j, extra_space;
  char *r, *result = NULL;

  for(i = 0; i < 2; ++i) {
    for(j = extra_space = 0; s[j]; ++j) {
      if (s[j] == '\t') {
        int es0 = 8 - (j + extra_space & 7);
        if (result != NULL) {
          strncpy(r, "        ", es0);
          r += es0;
        }
        extra_space += es0 - 1;
      } else if (result != NULL)
        *r++ = s[j];
    }
    if (result == NULL)
      if ((r = result = malloc(j + extra_space + 1)) == NULL)
        return NULL;
  }
  *r = 0;
  return result;
}

#include <stdio.h>

int main(int ac, char **av) {
  char space[1000];
  while (fgets(space, sizeof space, stdin) != NULL) {
    char *s = expand_tabs(space);
    fputs(s, stdout);
    free(s);
  }
  return 0;
}
DigitalRoss