ansaurus

Question

How to calculate the length of a string in C efficiently?

Answer 1

+6 A:

Take a look at the source code of strlen in the standard libc. Functions in standard libraries are generally highly optimized. Check it out here (coded in assembly) - this is from the GNU libc.

size_t
DEFUN(strlen, (str), CONST char *str)
{
  int cnt;

  asm("cld\n"                   /* Search forward.  */
      /* Some old versions of gas need `repne' instead of `repnz'.  */
      "repnz\n"                 /* Look for a zero byte.  */
      "scasb" /* %0, %1, %3 */ :
      "=c" (cnt) : "D" (str), "0" (-1), "a" (0));

  return -2 - cnt;
}

Sudhanshu 2010-01-15 09:39:01

Answer 2

+6 A:

strlen(). Odds are, if somebody had found a better, faster generic method, strlen would have been replaced with that.

aib 2010-01-15 09:39:41

Answer 3

+2 A:

The easiest way is to call strlen(). Seriously. It's already optimized by your compiler and/or library vendors, to be as fast as possible for your architecture.

One common optimization is to remove the need to increase a counter, and compute the length from the pointer:

size_t my_strlen(const char *s)
{
  const char *anchor = s;

  while(*s)
   s++;

  return s - anchor;
}

unwind 2010-01-15 09:39:50

Answer 4

+11 A:

From the FreeBSD source code:

size_t
strlen(const char *str)
{
    const char *s;
    for (s = str; *s; ++s);
    return(s - str);
}

Compared to your code, this probably maps very nicely to an assembler instruction, which can explain a big performance difference.

Andomar 2010-01-15 09:40:32

The compiler should be able to optimise this fairly effectively, which means the code is still readable, and should still run fairly quickly.

Matthew Iselin 2010-01-15 09:54:54

Answer 5

A:

printf strlen("Hello World!");

See: Wikipedia strlen for further info

Tanner 2010-01-15 09:40:41

Answer 6

+6 A:

Take a look at GNU C library's strlen() source.

It uses a number of non-obvious tricks to gain speed without dropping to assembly, including:

getting to a character that's properly aligned
reading those aligned parts of the string into an int (or some larger datatype) to read several chars at a time
using bit twiddling tricks to check if one of the chars embedded in that block of chars is zero

etc.

Michael Burr 2010-01-15 09:44:05

Current FreeBSD one uses something similar, could come handy too: http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/string/strlen.c?rev=1.7.2.1.2.1;content-type=text%2Fplain

Xandy 2010-01-15 10:21:23

What do you mean "without dropping to assembly"? On i386, it does use assembly (see Sudhanshu's reply).

bortzmeyer 2010-01-15 12:51:30

Sudhanshu's is different than the one I linked to. It's certainly possible that when glibc is built for an x86 Sudhanshu's gets used (I'm honestly not sure); however, the example I pointed to is straight C code that can be used as an example of some possible optimizations.

Michael Burr 2010-01-15 16:24:06

Answer 7

+2 A:

C strings are intrinsically inefficient, there are two reasons for using the ASCIZ convention:

The standard C library uses it
The compiler uses it for literal string constants

The first of these is academic in this instance since you are not using the standard library, the second is easily overcome by creating functions or macros that provide conversions from C strings to a more efficient convention such as Pascal strings. The point is you need not be a slave to the C convention if you are not using the C library.

Clifford 2010-01-15 10:39:36

++ You're right, but sometimes we look for cycles in all the wrong places. I haven't seen any real code where the speed of *strlen* was even on the radar, considering the multitude of macro-ways that typically make software slow.

Mike Dunlavey 2010-01-15 15:07:59

@Mike: Couldn't agree more. Likely to be premature optimisation, but the article I linked gives a couple of real world examples where it has been critical. A strlen() function for a pascal string is both fast and deterministic.

Clifford 2010-01-15 17:03:08

Answer 8

+1 A:

Yet another way to speed up char counting is to use vectorization!

Here's an example of how to do this with respect to UTF8-encoded strings:

Even faster UTF-8 character counting,

http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html

qbitty 2010-01-15 12:04:52

Answer 9

A:

On i386 processors, libc often use an ultra-optimized version of strlen, often written in assembly language. The paper "String Length" explains how they work.

Here is one optimized version for OpenBSD. (They also have a portable version.) Here is the version for the GNU libc.

bortzmeyer 2010-01-15 12:55:40

ansaurus

tags:

views:

answers:

How to calculate the length of a string in C efficiently?

related questions