tags:

views:

209

answers:

9

What are various ways in C/C++ to define a string with no null terminating char(\0) at the end?

EDIT: I am interested in character arrays only and not in STL string.

+1  A: 

In C++ you can use the string class and not deal with the null char at all.

codaddict
+1  A: 

Use std::string.

There are dozens of other ways to store strings, but using a library is often better than making your own. I'm sure we could all come up with plenty of wacky ways of doing strings without null terminators :).

JoshD
+6  A: 

C++ std::strings are not NUL terminated.

P.S : NULL is a macro1. NUL is \0. Don't mix them up.

1: C.2.2.3 Macro NULL

The macro NULL, defined in any of <clocale>, <cstddef>, <cstdio>, <cstdlib>, <cstring>, <ctime>, or <cwchar>, is an implementation-defined C++ null pointer constant in this International Standard (18.1).

Prasoon Saurav
NULL and NUL are both just a fancy way of saying 0.
Alexander Rafferty
I wish I could give another +1 for that NULL clarification with footnote.
JoshD
@Alexander Rafferty: NUL is the name of the null character '\0', while NULL is a null pointer. In C it is usually defined as `(void*)0`, while in C++ is only `0`. Note that the difference is the type, not the value.
David Rodríguez - dribeas
A: 

In C there generally won't be an easier solution. You could possibly do what pascal did and put the length of the string in the first character, but this is a bit of a pain and will limit your string length to the size of the integer that can fit in the space of the first char. In C++ I'd definitely use the std::string class that can be accessed by

#include <string>

Being a commonly used library this will almost certainly be more reliable than rolling your own string class.

shuttle87
A: 

The reason for the NULL termination is so that the handler of the string can determine it's length. If you don't use a NULL termination, you need to pass the strings length, either through a separate parameter/variable, or as part of the string. Otherwise, you could use another delimeter, so long as it isn't used within the string itself.

To be honest, I don't quite understand your question, or if it actually is a question.

Alexander Rafferty
+5  A: 

The terminating null is there to terminate the string. Without it, you need some other method to determine it's length.

You can use a predefined length:

char s[6] = {'s','t','r','i','n','g'};

You can emulate pascal-style strings:

unsigned char s[7] = {6, 's','t','r','i','n','g'};

You can use std::string (in C++). (since you're not interested in std::string).

Preferably you would use some pre-existing technology that handles unicode, or at least understands string encoding (i.e., wchar.h).

And a comment: If you're putting this in a program intended to run on an actual computer, you might consider typedef-ing your own "string". This will encourage your compiler to barf if you ever accidentally try to pass it to a function expecting a C-style string.

typedef struct {
    char[10] characters;
} ThisIsNotACString;
Seth
+1 for the most complete answer so far, main thing missing is a discussion of `char s[3] = "abc";`...
Tony
A: 

Even the string class will store it with a null. If for some reason you absolutely do not want a null character at the end of your string in memory, you'd have to manually create a block of characters, and fill it out yourself.

I can't personally think of any realistic scenario for why you'd want to do this, since the null character is what signals the end of the string. If you're storing the length of the string too, then I guess you've saved one byte at the cost of whatever the size of your variable is (likely 4 bytes), and gained faster access to the length of said string.

Bryan
+8  A: 

Typically as another poster wrote:

char s[6] = {'s', 't', 'r', 'i', 'n', 'g'};

or if your current C charset is ASCII, which is usually true (not much EBCDIC around today)

char s[6] = {115, 116, 114, 105, 110, 107};

There is also a largely ignored way that works only in C (not C++)

char s[6] = "string";

If the array size is too small to hold the final 0 (but large enough to hold all the other characters of the constant string), the final zero won't be copied, but it's still valid C (but invalid C++).

Obviously you can also do it at run time:

char s[6];
s[0] = 's';
s[1] = 't';
s[2] = 'r';
s[3] = 'i';
s[4] = 'n';
s[5] = 'g';

or (same remark on ASCII charset as above)

char s[6];
s[0] = 115;
s[1] = 116;
s[2] = 114;
s[3] = 105;
s[4] = 110;
s[5] = 103;

Or using memcopy (or memmove, or bcopy but in this case there is no benefit to do that).

memcpy(c, "string", 6);

or strncpy

strncpy(c, "string", 6);

What should be understood is that there is not such thing as a string in C (in C++ there is strings objects, but that's completely another story). so call strings are just char arrays. And even the name char is misleading, is is no char but just a kind of numerical type We could probably have called it byte instead it, but in the old time there was strange hardware using 9 bits registers or such and byte implies 8 bits.

As char will very often be used to store a character code, C designers thought of a simpler way than store a number in a char. You could put a letter between simple quotes and the compiler will understand it must store this character code in the char.

What I mean is (for example) that you don't have to do

char c = '\0';

To store a code 0 in a char, just do:

char c = 0;

As we very often have to work with a bunch of chars of variable length, C designers also choosed a convention for "strings". Just put a code 0 where the text should end. By the way there is a name for this kind a string representation "zero terminated string" and if you see the two letters sz at the beginning of a variable name it usually means that it's content is a zero terminated string.

"C sz strings" are not a type at all, just an array of chars as normal as, say, an array of int, but string manipulation functions (strcmp, strcpy, strcat, printf, and many many others) understand and use the 0 ending convention. That also mean that if you have a char array that is not zero terminated, you shouldn't call any of these functions it will likely do something wrong (or you must be extra carefull and use functions with a n in their name like strncpy)..

The biggest problem with this convention is that there is many cases where it's inefficient. One typical exemple: you want to put something at the end of a 0 terminated string. If you had kept the size you could just jump at the end of string, with sz convention, you have to check it char by char. Other kinds of problems occurs when dealing with encoded unicode or such. But at the time C was created this convention was very simple and did perfectly the job.

Nowaday, we letters between double quotes like "string" are not plain char arrays as in the past, but const char *. That means that what the pointer points to is a constant that should not be modified (if you want to modify it you must first copy it), and that is a good thing because it helps to detect many programming errors at compile time.

kriss
+1, but to nitpick, the type of a string literal (i.e. `"hi"`) is not `const char*`, but rather `const char[3]` where `3` is the number of characters + 1 for the trailing 0. It can be directly assigned to a `const char*` as arrays decay into pointers to the first element, but this simple test will show the difference: `assert( sizeof(const char*) != sizeof("Hi there!") )`
David Rodríguez - dribeas
@David Rodríguez - dribeas: Yes, you are right, but I believe my answer is already complex enough without adding details on differences between array types and pointers. For those interested by the subject I tried to explain it in that answer: http://stackoverflow.com/questions/3613302/passing-array-of-structures-to-function-c/3613350#3613350
kriss
A: 

Just for the sake of completeness and nail this down completely.

vector<char>

Chubsdad