ansaurus

Question

How to memset() memory to a certain pattern instead of a single byte?

Answer 1

+2 A:

You could set up the sequence somewhere then copy it using memcpy() to where you need it.

James 2010-07-27 15:14:57

Answer 2

+3 A:

On OS X, one uses memset_pattern4( ) for this; I would expect other platforms to have similar APIs.

I don't know of a simple portable solution, other than just filling in the buffer with a loop (which is pretty darn simple).

Stephen Canon 2010-07-27 15:16:32

I forgot I could use loop. Thanks for reminding. Trying it with loop now.

bodacydo 2010-07-27 15:18:03

@bodacydo: lol. literally. happens to all of us :(

aib 2010-07-27 16:17:32

Answer 3

+1 A:

Well, the normal method of doing that is to manually setup the first four bytes, and then memcpy(ptr+4, ptr, len -4)

This copies the first four bytes into the second four bytes, then copies the second four bytes into the third, and so on.

Note, that this "usually" works, but is not guarenteed to, depending on your CPU architecture, and your C run-time library.

James Curran 2010-07-27 15:17:01

The behavior of `memcpy` is undefined if the source and destination buffers overlap. This will probably work on some platforms, but it will certainly not work on many others.

Stephen Canon 2010-07-27 15:18:20

This will overwrite the first four bytes with whatever is in the second four bytes. Further, `memcpy` should not be used with overlapping ranges.

bstpierre 2010-07-27 15:20:58

Are you commenting on my original or editted message. memcpy is (dest, src, len), which I have correct now. (I had it backward initially, but I though I had it fixed before your comment)

James Curran 2010-07-27 15:39:50

I was commenting on the original -- see that you fixed the order now. (I'd remove the downvote but you'd need to edit to unlock it)

bstpierre 2010-07-27 16:40:40

Saying that this "usually" works is a little misleading; I expect this to fail for sufficiently large buffers on any modern platform that has a 64-bit or wider load/store path and a reasonably optimized `memcpy` implementation, and to fail on many platforms with a 32-bit wide load/store path as well.

Stephen Canon 2010-07-27 17:50:15

IF you are just doing this for 4 bytes, you'd be better off speed wise with a straight *iPtr++ = dwordValue than a call to memcpy. memcpy is only faster once you get above N bytes. N being more than 4. Can't remember the value, but did some tests on it a few years back.

Stephen Kellett 2010-07-28 10:36:32

@STephen K: That really depends on the compiler/library. Many modern compilers will automatically inline the memcpy, so you'd be doing explicitly merely what the compiler would be doing implicitly.

James Curran 2010-07-28 13:49:48

@James Curran. I was thinking of the Microsoft implementation. Last time I looked its in assembler as a discrete non-inlineable function. Even if it was inline-able, my statement would still stand correct. Take a look at the code. The first things it does it determine if the copy is DWORD aligned and if not it copies individual bytes until it is DWORD aligned then does a DWORD mov instruction with a repeat instruction directive. Not looked at the x64 implementation but wouldn't be at all surprised to find extra code in there to handle QWORD copying (with extra delay due to extra comparisons)

Stephen Kellett 2010-07-30 09:41:54

@Stephen: `memcpy` is listed among the functions MSC++ will inline (http://msdn.microsoft.com/en-us/library/tzkfha43(v=VS.71).aspx). I don't recall when they started doing that, but I'm pretty sure it goes back to the mid-90's, at least.

James Curran 2010-07-30 13:46:14

@James. memcpy inlining is supported when memcpy is defined as an instrinsic (I didn't know that, thank you). intrinsic is something MS added when they started supporting x64 (and as a by product they removed inline assembler :-(), so at a guess this goes back to VS2005. Anyway, take a look at the memcpy implementation, it spends so much time up front working out what to do that for a small number of bytes you are faster copying yourself. We did the timings and added in a check so that we only call memcpy for a number of bytes exceeding N (which I can't remember what N is).

Stephen Kellett 2010-07-31 10:36:01

Answer 4

+1 A:

Recursively copy the memory, using the area which you already filled as a template per iteration (O(log(N)):

int fillLen = ...;
int blockSize = 4; // Size of your pattern

memmove(dest, srcPattern, blockSize);
char * start = dest;
char * current = dest + blockSize;
char * end = start + fillLen;
while(current + blockSize < end) {
    memmove(current, start, blockSize);
    current += blockSize;
    blockSize *= 2;
}
// fill the rest
memmove(current, start, (int)end-current);

[EDIT] What I mean with "O(log(N))" is that the runtime will be much faster than if you fill the memory manually since memmove() usually uses special, hand-optimized assembler loops that are blazing fast.

Aaron Digulla 2010-07-27 15:28:30

It's O(log(n)) calls to `memmove`; the actual complexity is still O(n).

Stephen Canon 2010-07-27 15:39:02

Answer 5

+2 A:

If your pattern fits in a wchar_t, you can use wmemset() as you would have used memset().

Laurent Parenteau 2010-07-27 16:09:52

Answer 6

+2 A:

An efficient way would be to cast the pointer to a pointer of the needed size in bytes (e.g. uint32_t for 4 bytes) and fill with integers. It's a little ugly though.

char buf[256] = { 0, };
uint32_t * p = (uint32_t *) buf, i;

for(i = 0; i < sizeof(buf) / sizeof(* p); ++i) {
        p[i] = 0x11223344;
}

Not tested!

jkramer 2010-07-27 16:24:39

The one thing to be aware of is that `buf` might not satisfy the alignment requirements for a `uint32_t` on your platform. If `buf` is the result of a `malloc`, you don't need to worry about this, but if it's (say) passed in as an argument by code you don't control, you'll need to check the alignment before you write to it in this fashion, or else this will result in invalid accesses on some platforms.

Stephen Canon 2010-07-27 16:41:54

Another thing to watch for may be endianness, if this is run on a little endian computer and the filling and reading are done using types with different sizes (ie. filling with int but reading with char)

Laurent Parenteau 2010-07-27 19:32:02

This is not very efficient; using `memmove()` as in my example is much, much faster because it uses special assembler ops and hand-optimized code.

Aaron Digulla 2010-07-28 09:50:48

You can still optimzed the code. For example you can use pointer arithmetic instead of the loop variable. I chose the example above to make the actual point of it clearer.

jkramer 2010-07-28 13:05:02

@Aaron Digulla: that claim depends on a lot of things: for small buffers, for example, you're going to get slaughtered by function call overhead making repeated small calls to `memmove( )`. For "typical" buffers, your solution will probably be faster on most platforms with a well-optimized library, but for truly huge buffers, your solution will asymptotically take twice as many page faults and be very nearly 2x slower on most platforms.

Stephen Canon 2010-07-28 15:59:07

@Stephen: It's simple to limit the block size to a few pages. And for small buffers, you can use optimize with the loop above.

Aaron Digulla 2010-07-29 19:12:22

@Aaron: Of course, but the code you posted in your answer doesn't do those things, and so the claim that it is "much, much faster" can't be justified without some additional assumptions. If one is willing to trade complexity for performance, one would either use a platform-specific API like `memset_pattern4( )` or write a dedicated implementation themselves.

Stephen Canon 2010-07-29 19:24:24

ansaurus

tags:

views:

answers:

How to memset() memory to a certain pattern instead of a single byte?

related questions