ansaurus

Question

How to implement a variable-length ‘string’-y in C

Answer 1

+2 A:

realloc will relocate your memory to a bigger buffer -- simply using this command will allow you to resize your string. Use the following struct for your string:

struct mystring
{
    char * string;
    size_t size;
};

The important part being keeping a track of the size of your string, and having every string manipulation function testing if the size makes sense.

You could pre-allocate a large buffer and keep adding to it, and only realloc when said buffer is full -- you have to implement all the functions for this. It is preferable (far less error prone, and more secure) to mutate string by moving from one immutable string to another (using the semantics of realoc).

Hassan Syed 2010-02-11 09:18:47

Yes, but how can I declare this in the struct? Could you include some code, based on the example I provided? I’d be much appreciative.

elliottcable 2010-02-11 09:21:01

Ah, I spoke too soon. Thanks for the edit, and useful information. (-:

elliottcable 2010-02-11 09:22:06

Answer 2

+3 A:

The common approach is to have a field for length and a pointer to a dynamically allocated region of memory to hold the characters.

typedef struct string {
    size_t length;
    unsigned char *native;
} string_t;

string_t String__create(char native[]) {
    string_t this;
    this.length = strlen(native);
    this.native = malloc(this.length+1);
    if (this.native) {
        strncpy(this.native, native, this.length+1);
    } else {
        this.length = 0;
    }
    return this;
}

If you want to dynamically allocate the string_t:

string_t* String__create(char native[]) {
    string_t* this;
    if (this = malloc(sizeof(*this))) {
        this->length = strlen(native);
        this->native = malloc(this->length+1);
        if (this->native) {
            strncpy(this->native, native, this->length+1);
        } else {
            free(this);
            this=NULL;
        }
    }
    return this;
}
void String__delete(string_t** str) {
    free((*str)->native);
    free((*str));
    *str=NULL;
}

outis 2010-02-11 09:19:18

Answer 3

+6 A:

Many C++ std::string implementations now use a "Small String Optimization". In pseudo-code:

struct string {
    Int32 length
    union {
        char[12] shortString
        struct {
           char* longerString
           Int32 heapReservedSpace
        }
    }
}

The idea is that string up to 12 characters are stored in the shortString array. The entire string will be contiguous and use only a single cache line. Longer strings are stored on the heap. This leaves you with 12 spare bytes in the string object. The pointer doesn't take all of that, so you can also remember how much memory you've allocated on the heap (>=length). That helps to support scenario's in which you grow a string in small increments.

MSalters 2010-02-11 10:01:38

I don’t quite understand how that’s legal; how do you access the struct portion of the union, if it has no name?

elliottcable 2010-02-11 10:16:22

Also, this solution is really neat. I’m *definitely* going to use this—+1, and thanks for responding ^_^

elliottcable 2010-02-11 10:19:12

Hey, you're designing the language - the language itself doesn't need to access things by name. Furthermore, your language could support the "anonymous struct" rule: names of an anonymous struct are visible in the enclosing scope instead. Same thing as the anonymous union I used, really.

MSalters 2010-02-11 10:24:38

Oh, you misunderstood me. I’m not writing a C-like language; I’m writing a language interpreter *in* C. I was asking how, in C, I access the anonymous struct/unions you used there.**Edit:** Ah, scratch that. I just did a bit of googling, found out it’s a C++ feature. Thanks!

elliottcable 2010-02-11 10:30:19

@elliottcable: This answer talks about how this is typically done in C++, which also allows anonymous unions. In C, just add an inner name to the union member of the struct and you'll be fine.

unwind 2010-02-11 10:33:21

Answer 4

+3 A:

In addition to what others have told you, I'd also include the concept of "capacity": It is not possible to know the size of the allocated vector in memory, you must store it. If you do a sizeof of the String struct, it will return you 4 bytes * 3 numeric fields = 12 bytes (probably bigger due to padding used in structures). Also, you cannot get the length of allocated memory through sizeof.

typedef struct _mystring {
        char * native;
        size_t size;
        size_t capacity;
} String;

This way, capacity always bears the actual size of the chunk in which your string is. Say that your string goes shorter: you don't have to realloc in order to get an exact match between the capacity and the size of your string. Also, you can alloc from the beginning the characters you expect the string to have, and not the characters the initial string has. Finally, you can mimic the C++ string char dynamic vector and double capacity each time the string grows beyond the capacity limit. All of these will keep memory operations to a minimum, which will translate in better performance (you will also waste some memory, but never as much as 1Kb).

String String__create(char native[], size_t capacity) {
  String this;

  this.size = strlen( native );
  if ( capacity < ( this.size + 1 ) )
        this.capacity = this.size + 1;
  else  this.capacity = capacity;

  this.native = (char *) malloc( capacity * sizeof( char ) );
  strcpy( this.native, native );

  return this;
}

String * String__set(String *this, char native[]) {
    this->size = strlen( native );

    if ( this->size >= this->capacity ) {
        do {
            this->capacity <<= 1;
        } while( this->size > this->capacity );

        this->native = realloc( this->native, this->capacity );
    }

    strcpy( this->native, native );

    return this;
}

void String__delete(String *this)
{
    free( this->native );
}

Baltasarq 2010-02-11 10:03:35

Why is this necessary/useful? Cannot I simply ‘sizeof’ the string to get the current capacity?

elliottcable 2010-02-11 10:21:11

Actually not. If you do a sizeof of the String struct, it will return you 4 bytes * 3 numeric fields = 12 bytes (probably bigger due to padding used in structures). Also, you cannot get the length of allocated memory through sizeof.

Baltasarq 2010-02-11 11:07:43

Ah, forgive me. I didn’t know that. I can’t upvote your answer again unless you edit it; if you do so, I’ll gladly add a +1 (-:

elliottcable 2010-02-11 11:23:07

ansaurus

tags:

views:

answers:

How to implement a variable-length ‘string’-y in C

related questions