ansaurus

Question

Answer 1

+6 A:

I would suggest taking a look at GNU libc's source. As for most functions, it will contain both a generic optimized C version of the function, and optimized assembly language versions for as many supported architectures as possible, taking advantage of machine specific tricks.

Chris Young 2009-02-08 03:56:04

Thanks, except this is LGPL code and D's standard library is supposed to be permissively licensed. I don't want that to be an issue.

dsimcha 2009-02-08 03:59:50

Well, I was suggesting you look at it for technique inspiration, rather than for copying the source.

Chris Young 2009-02-08 04:03:10

It is roughly 150 lines of code, about half or more of which are comments, so it explains the optimizations in a fair amount of detail.

Chris Young 2009-02-08 04:04:15

http://stackoverflow.com/questions/405208/gpl-code-what-counts-as-a-derivative-work I'm the OP on this question.

dsimcha 2009-02-08 04:04:53

I've read that question before, and it's a good question. The answer is that you can't copyright techniques, only the code itself. Also, the FSF are vehemently proposed to software patents or the idea that algorithms and ideas can be effectively copyrighted and their use restricted.

Chris Young 2009-02-08 04:12:32

Answer 2

+5 A:

This implementation of memchr from newlib is one example of someone's optimizing memchr: it's reading and testing 4 bytes at a time (apart from memchr, other functions in the newlib library are here).

Incidentally, most of the the source code for the MSVC run-time library is available, as an optional part of the MSVC installation (so, you could look at that).

ChrisW 2009-02-08 03:59:58

i was going to answer with the newlib code of memchr - until i clicked your link and saw it's also about newlib :)

Johannes Schaub - litb 2009-02-08 07:01:14

if you like, you can link them to this: http://sourceware.org/cgi-bin/cvsweb.cgi/src/newlib/libc/string/?cvsroot=src , cvs directory containing all the sweet fast string functions of newlib, including memchr.c

Johannes Schaub - litb 2009-02-08 07:06:06

Answer 3

+1 A:

Here is FreeBSD's (BSD-licensed) memchr() from memchr.c. FreeBSD's online source code browser is a good reference for time-tested, BSD-licensed code examples.

void *
memchr(s, c, n)
    const void *s;
    unsigned char c;
    size_t n;
{
    if (n != 0) {
     const unsigned char *p = s;

     do {
      if (*p++ == c)
       return ((void *)(p - 1));
     } while (--n != 0);
    }
    return (NULL);
}

cpeterso 2009-02-08 04:09:04

Yeah, I found this too. Nothing fancy here, though, that would explain the ridiculous speed difference.

dsimcha 2009-02-08 04:10:11

Answer 4

+2 A:

memchr like memset and memcpy generally reduce to fairly small amount of machine code. You are unlikely to be able to reproduce that kind of speed without inlining similar assembly code. One major issue to consider in an implementation is data alignment.

One generic technique you may be able to use is to insert a sentinel at the end of the string being searched, which guarantees that you will find it. It allows you to move the test for end of string from inside the loop, to after the loop.

EvilTeach 2009-02-08 05:15:06

Answer 5

+3 A:

On the x86 platform, the faster implementations usually compile down to the rep scasb (REPeated SCAn String for Byte) instruction. (Some compilers may be able to optimize C loops to that, some not.)

CyberShadow 2009-02-08 09:11:21

Do you think that'll be faster than the newlib implemention which reads 4 bytes at a time? I don't know but those 8088/8086-era opcodes aren't necessarily the fastest anymore.

ChrisW 2009-02-08 17:26:57

Well, given that here we use one instruction (with a prefix) to search an entire string - versus a hand-written loop with a whole bunch of instructions every 4 bytes - I'd think that, intuitively, the one-opcode variant would be faster :)

CyberShadow 2009-02-09 09:47:37

ansaurus

tags:

views:

answers:

How does memchr() work under the hood?

related questions