Assuming an ideal situation: nothing is paged out, all code is really well written and fits in cache, the scheduler never interrupts you, etc.: can a single core in a multi-core CPU generate enough write traffic to saturate the IO bus to the DIMMs?
In a more concrete form: If I were to launch a program that does a 16GB memset
in one thread, would it run any slower than a pair of non overlapping 8GB memset
s? (The size is chosen to be large enough to reach steady state.)