You can bit-twiddle all you want, but you probably won't beat this:
int fast1(const char *s)
{
if (!*s++) return 0;
if (!*s++) return 1;
if (!*s++) return 2;
if (!*s++) return 3;
if (!*s++) return 4;
if (!*s++) return 5;
if (!*s++) return 6;
if (!*s++) return 7;
if (!*s++) return 8;
if (!*s++) return 9;
if (!*s++) return 10;
if (!*s++) return 11;
if (!*s++) return 12;
if (!*s++) return 13;
if (!*s++) return 14;
if (!*s++) return 15;
}
Alternatively, you can do this:
(whether this is faster depends on your processor and compiler).
int fast2(const char *s)
{
if (!s[0]) return 0;
if (!s[1]) return 1;
if (!s[2]) return 2;
if (!s[3]) return 3;
if (!s[4]) return 4;
if (!s[5]) return 5;
if (!s[6]) return 6;
if (!s[7]) return 7;
if (!s[8]) return 8;
if (!s[9]) return 9;
if (!s[10]) return 10;
if (!s[11]) return 11;
if (!s[12]) return 12;
if (!s[13]) return 13;
if (!s[14]) return 14;
if (!s[15]) return 15;
}
Update:
I profiled both of these functions on my Core2Duo T7200 @ 2.0 GHz, Windows XP pro, Visual Studio 2008 with optimizations turned off. (Turning on the optimizer causes VS to notice that there's no output in my timing loop, so it removes it entirely).
I called each function in a loop 222 times, then took the average over 8 runs.
fast1 takes about 87.20 ns per function call.
fast2 takes about 45.46 ns per function call.
So on my CPU, the array indexing version is almost twice as fast as the pointer version.
I wasn't able to get any of the other functions posted here to work, so I wasn't able to compare. The closest is the original poster's function, which compiles, but doesn't always return the correct value. When it does, it executes in about 59 ns per function call.
Update 2
This function is pretty fast too, at about 60 ns per call. I'd guess that the pointer dereference is being performed by the address unit and the multiplication by the integer unit, so the operations are pipelining. In my other examples, all the work is being done by the address unit.
int fast5(const char *s)
{
return /* 0 * (s[0] == 0) + don't need to test 1st byte */
1 * (s[1] == 0) +
2 * (s[2] == 0) +
3 * (s[3] == 0) +
4 * (s[4] == 0) +
5 * (s[5] == 0) +
6 * (s[6] == 0) +
7 * (s[7] == 0) +
8 * (s[8] == 0) +
9 * (s[9] == 0) +
10 * (s[10] == 0) +
11 * (s[11] == 0) +
12 * (s[12] == 0) +
13 * (s[13] == 0) +
14 * (s[14] == 0) +
15 * (s[15] == 0);
}