views:

469

answers:

9

How to calculate the length of a string in C efficiently (in time)?

Right now I'm doing:

int calculate_length(char *string) {
    int length = 0;
    while (string[length] != '\0') {
        length++;
    }
    return length;
}

But it's very slow compared to strlen() for example, is there any other way to do it?

Thanks.

EDIT: I'm working in a freestanding environment, I'm not allowed to use any external lib including "string.h".

+6  A: 

Take a look at the source code of strlen in the standard libc. Functions in standard libraries are generally highly optimized. Check it out here (coded in assembly) - this is from the GNU libc.

size_t
DEFUN(strlen, (str), CONST char *str)
{
  int cnt;

  asm("cld\n"                   /* Search forward.  */
      /* Some old versions of gas need `repne' instead of `repnz'.  */
      "repnz\n"                 /* Look for a zero byte.  */
      "scasb" /* %0, %1, %3 */ :
      "=c" (cnt) : "D" (str), "0" (-1), "a" (0));

  return -2 - cnt;
}
Sudhanshu
+6  A: 

strlen(). Odds are, if somebody had found a better, faster generic method, strlen would have been replaced with that.

aib
+2  A: 

The easiest way is to call strlen(). Seriously. It's already optimized by your compiler and/or library vendors, to be as fast as possible for your architecture.

One common optimization is to remove the need to increase a counter, and compute the length from the pointer:

size_t my_strlen(const char *s)
{
  const char *anchor = s;

  while(*s)
   s++;

  return s - anchor;
}
unwind
+11  A: 

From the FreeBSD source code:

size_t
strlen(const char *str)
{
    const char *s;
    for (s = str; *s; ++s);
    return(s - str);
}

Compared to your code, this probably maps very nicely to an assembler instruction, which can explain a big performance difference.

Andomar
The compiler should be able to optimise this fairly effectively, which means the code is still readable, and should still run fairly quickly.
Matthew Iselin
A: 

printf strlen("Hello World!");

See: Wikipedia strlen for further info

Tanner
+6  A: 

Take a look at GNU C library's strlen() source.

It uses a number of non-obvious tricks to gain speed without dropping to assembly, including:

  • getting to a character that's properly aligned
  • reading those aligned parts of the string into an int (or some larger datatype) to read several chars at a time
  • using bit twiddling tricks to check if one of the chars embedded in that block of chars is zero

etc.

Michael Burr
Current FreeBSD one uses something similar, could come handy too: http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/string/strlen.c?rev=1.7.2.1.2.1;content-type=text%2Fplain
Xandy
What do you mean "without dropping to assembly"? On i386, it does use assembly (see Sudhanshu's reply).
bortzmeyer
Sudhanshu's is different than the one I linked to. It's certainly possible that when glibc is built for an x86 Sudhanshu's gets used (I'm honestly not sure); however, the example I pointed to is straight C code that can be used as an example of some possible optimizations.
Michael Burr
+2  A: 

C strings are intrinsically inefficient, there are two reasons for using the ASCIZ convention:

  • The standard C library uses it
  • The compiler uses it for literal string constants

The first of these is academic in this instance since you are not using the standard library, the second is easily overcome by creating functions or macros that provide conversions from C strings to a more efficient convention such as Pascal strings. The point is you need not be a slave to the C convention if you are not using the C library.

Clifford
++ You're right, but sometimes we look for cycles in all the wrong places. I haven't seen any real code where the speed of *strlen* was even on the radar, considering the multitude of macro-ways that typically make software slow.
Mike Dunlavey
@Mike: Couldn't agree more. Likely to be premature optimisation, but the article I linked gives a couple of real world examples where it has been critical. A strlen() function for a pascal string is both fast and deterministic.
Clifford
+1  A: 

Yet another way to speed up char counting is to use vectorization!

Here's an example of how to do this with respect to UTF8-encoded strings:

Even faster UTF-8 character counting,

http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html

qbitty
A: 

On i386 processors, libc often use an ultra-optimized version of strlen, often written in assembly language. The paper "String Length" explains how they work.

Here is one optimized version for OpenBSD. (They also have a portable version.) Here is the version for the GNU libc.

bortzmeyer