ansaurus

Question

what does it mean to be "terminated by a zero"

Answer 1

+13 A:

It's a reserved value to indicate the end of a sequence of (for example) characters in a string.

More correctly known as null (or NUL) terminated. This is because the value used is zero, rather than being the character code for '0'. To clarify the distinction check out a table of the ASCII character set.

This is necessary because languages like C have a char data type, but no string data type. Therefore it is left to the devleoper to decide how to manage strings in their application. The usual way of doing this is to have an array of chars with a null value used to terminate (i.e. signify the end of) the string.

Note that there is a distinction between the length of the string, and the length of the char array that was originally declared.

char name[50];

This declares an array of 50 characters. However, these values will be uninitialised. So if I want to store the string "Hello" (5 characters long) I really don't want to bother setting the remaining 45 characters to spaces (or some other value). Instead I store a NUL value after the last character in my string.

More recent languages such as Pascal, Java and C# have a specific string type defined. These have a header value to indicate the number of characters in the string. This has a couple of benefits; firstly you don't need to walk to the end of the string to find out its length, secondly your string can contain null characters.

Wikipedia has further information in the String (computer science) entry.

Richard Ev 2010-04-19 13:18:26

Re: more recent languages: IIRC, that's called a Pascal string

Hasturkun 2010-04-19 14:40:05

Pascal strings specifically used a single byte to hold the string length. As you can quickly guess, that's not really enough! Modern `string` types are probably using a `size_t` instead; if your string's won't fit in that, the string isn't going to be held wholly in memory either.

Donal Fellows 2010-04-19 14:58:02

Answer 2

A:

Arrays and string in C is just a pointers to a memory location. By pointer you can find a start of array. The end of array is undefined. The end of character array (which is the string) is zero-byte.

So, in memory string hello is written as:

68 65 6c 6c 6f 00                                 |hello|

Андрей Костенко 2010-04-19 13:21:46

Answer 3

A:

It refers to how C strings are stored in memory. The NUL character represented by \0 in string iterals is present at the end of a C string in memory. There is no other meta data associated with a C string like length for example. Note the different spelling between NUL character and NULL pointer.

pixelbeat 2010-04-19 13:21:48

Answer 4

+15 A:

Take the string Hi in ASCII. Its simplest representation in memory is two bytes:

0x48
0x69

But where does that piece of memory end? Unless you're also prepared to pass around the number of bytes in the string, you don't know - pieces of memory don't intrinsically have a length.

So C has a standard that strings end with a zero byte, also known as a NUL character:

0x48
0x69
0x00

The string is now unambiguously two characters long, because there are two characters before the NUL.

RichieHindle 2010-04-19 13:21:51

And buffer overflows happen when you fail to realize that you need three bytes to store two characters.

MSalters 2010-04-19 14:25:59

@MSalters: No, they happen when you realize that a length-two string consists of three characters. :-)

Donal Fellows 2010-04-19 14:55:44

Answer 5

A:

C-style strings are terminated by a NULL character ('\0'). This provides a marker for functions that operate on strings (e.g. strlen, strcpy) to use to identify the end of the string.

TheJuice 2010-04-19 13:23:08

Answer 6

+3 A:

Terminated by a zero

It's when your pointy-haired boss fires you.

Assaf Lavie 2010-04-19 13:26:23

Answer 7

A:

There are two common ways to handle arrays that can have varying-length contents (like Strings). The first is to separately keep the length of the data stored in the array. Languages like Fortran and Ada and C++'s std::string do this. The disadvantage to doing this is that you somehow have to pass that extra information to everything that is dealing with your array.

The other way, is to reserve an extra non-data element at the end of the array to serve as a sentinel. For the sentinel you use a value that should never appear in the actual data. For strings, 0 (or "NUL") is a good choice, as that is unprintable and serves no other purpose in ASCII. So what C (and many languages copied from C) do is to assume that all strings end (or "are terminated by") a 0.

There are several drawbacks to this. For one thing, it is slow. Any time a routine needs to know the length of the string, it is an O(n) operation (searching through the entire string looking for the 0). Another problem is that you may one day want to put a 0 in your string for some reason, so now you need a whole second set of string routines that ignore the null and use a separate length anyway (eg: strnlen() ). The third big problem is that if someone forgets to put that 0 at the end (or it gets wiped out somehow), the next string operation to do a lenth check will go merrily marching through memory until it either happens to randomly find another 0, crashes, or the user loses patience and kills it. Such bugs can be a serious PITA to track down.

For all these reasons, the C approach is generally viewed with disfavor.

T.E.D. 2010-04-19 14:28:05

ansaurus

tags:

views:

answers:

what does it mean to be "terminated by a zero"

related questions