From what I remember when I got a peek at Windows source code, Stretchblt was written by a summer intern in the mid-1980s and never got fixed. The same inefficient code has been carried through to all versions of Windows. On a few Windows mobile devices, StretchBlt will take advantage of 2D hardware acceleration, but not on Desktop Windows. Use DirectDraw on desktop windows to make use of 2D hardware acceleration and get a fast blit. In my desktop video game products, I have my own stretchblt code which stretches the bits in a memory buffer and then calls BitBlt to copy them to the display and that is considerably faster than simply calling StretchBlt.
What I believe is the main cause of StretchBlts failings is that the code is totally oblivious to the speed of cached/uncached memory and it thrashes the cache as it stretches the image.