Hi,
I once saw a programming pattern (not design), how to implement a fast copy of buffers. It included an interleaved loop and switch. The thing was, it copied 4 bytes most of the time, only the last few bytes of the buffer were copied using smaller datatypes.
Can someone tell me the name of it? It's named after a person. It's done in C and the compiler output is nearly optimal.