views:

171

answers:

5

is there any faster way to parse a text than by walk each byte of the text?

I wonder if there is any special CPU (x86/x64) instruction for string operation that is used by string library, that somehow used to optimize the parsing routine.

for example instruction like finding a token in a string that could be run by hardware instead of looping each byte until a token is found.

*edited->note: I'am asking more to algorithm instead of CPU architecture, so my really question is, is there any special algorithm or specific technique that could optimize the string manipulation routine given the current cpu architecture.

+5  A: 

The x86 had a few string instructions, but they fell out of favor on modern processors, because they became slower than more primitive instructions which do the same thing.

The processor world is moving more and more towards RISC, ie, simplistic instruction sets.

Quote from Wikipedia (emphasis mine):

The first highly (or tightly) pipelined x86 implementations, the 486 designs from Intel, AMD, Cyrix, and IBM, supported every instruction that their predecessors did, but achieved maximum efficiency only on a fairly simple x86 subset that resembled only a little more than a typical RISC instruction set (i.e. without typical RISC load-store limitations).

This is still true on today's x86 processors.

You could get marginally better performance processing four bytes at a time, assuming each "token" in the text was four-byte-aligned. Obviously this isn't true for most text... so better to stick with byte-by-byte scanning.

zildjohn01
Having said that, `SSE4.2` on Nehalem (Core i7 *et al*) has the PCMPxxx family of string comparison instructions.
Paul R
Agree with Paul R - vector compare is the most reasonable way to improve throughput of compares, and unfortunately I still don't have an SSE4.2 CPU to test on strings, but I know my numeric compares are much faster when I vectorize them.
codekaizen
+3  A: 

Yes there are special CPU instructions; and the run-time library, which implements functions like strchr, might be written in assembly.

One technique that can be faster than walking bytes is to walk double-words, i.e. to process data 32 bits at a time.


The problem with walking bigger-than-the-smallest-addressable-memory-unit chunks in the context of strings is one of alignment

You add code at the begining and end of your function (before and after your loop), to handle the uneven/unaligned byte[s]. (shrug) It makes your code faster: not simpler. The following for example is some source code which claims to be an improved version of strchr. It is using special CPU instructions, but it's not simple (and has extra code for the unaligned bytes):

PATCH: Optimize strchr/strrchr with SSE4.2 -- "This patch adds SSE4.2 optimized strchr/strrchr. It can speed up strchr/strrchr by up to 2X on Intel Core i7"

ChrisW
The problem with walking bigger-than-the-smallest-addressable-memory-unit chunks in the context of strings is one of alignment. Certainly I can compare 4 or 8 bytes at a time, but what is the sequence in the target string has different alignment than the sequence in the test string. There may be architectures that deal with this issue, but I don't them.
dmckee
@dmckee - I replied to your comment by editing my answer
ChrisW
Egads! The patch notes say you can get up to a factor of 2 out of it, but you'd have to need the speed badly...
dmckee
@dmckee - Yes; the last time I looked into source code for the compiler's run-time library was back in the '90s, and even back then it was implemented to operate on multi-byte-data using assembly-language cleverness. Normal advice is to use the optimized, low-level, standard library functions, instead of reimplementing them yourself in application-level C code. The best implementations of the various algorithms might easily be CPU-specific; [Intel has a compiler](http://software.intel.com/en-us/intel-compilers/) for example, that's reputed to emit code that's tighter than Microsoft's or Gnu's.
ChrisW
+2  A: 

While (some) processors do have string instructions, they're of little use in producing faster code. First of all, as @zildjohn01 noted, they're often slower than other instructions with current processors. More importantly, it rarely makes much difference anyway -- if you're scanning much text at all, the bottleneck will usually be the bandwidth from the memory to the CPU, so essentially nothing you do with changing instructions is likely to make a significant difference in any case.

That said, especially if the token you're looking for is long, a better algorithm may be useful. A Boyer-Moore search (or variant) can avoid looking at some of the text, which can give a substantial improvement.

Jerry Coffin
+1  A: 

Well, you have to look at everything to know everything about the text, at some level. Arguably, you could have some sort of structured text, which gives you further information about where to walk at each point in a sort of n-ary space partition. But then, the "parsing" has partially been done back when the file was created. Starting from 0 information, you will need to touch every byte to know everything.

Paul Nathan
A: 

Yes. As your edit specifically asks for algorithms, I'd like to add an example fo that.

You're going to know the language that you are parsing, and you can use that knowledge when building a parser. Take for instance a language in which every token must be at least two characters, but any length of whitespace can occur between tokens. Now when you're scanning through whitespace for the next token, you can skip every second character. Only when you hit the first non-whitespace do you need to back up one character.

Example:

01234567890123456789
FOO        BAR    

When scanning for the next token, you'd probe 4,6,8,10 and 12. Only when you see the A do you look back to 11 to find B. You never looked at 3,5,7 and 9.

This is a special case of the Boyer–Moore string search algorithm

MSalters