Fraps is the archetypal example of doing this sort of thing to a fullscreen DirectX application from a third-party app. It works by hooking some system calls and inserting itself into the call-chain between an app and DirectX. There is some performance hit, but in general its minimal.
This page seems to have some details and sample code on how to hook the app in this way.
If I recall correctly, from other forum discussions (can't find the link at the moment. search for things like "how does fraps work", it's a popular question), Fraps hooks a few things to force the app to load its DLL, then hooks Present() calls and executes a device->Clear() call before calling the real Present(), with a list of small rectangles to set to a different color, which can spell out the FPS number that it displays. This has a minimal performance impact and is widely compatible with whatever rendering the app is doing. Overlaying a bitmap would be more complicated since it wouldn't be as easy to do at Present-time. Perhaps if you could hook EndScene, then you could do more, but you would have to be careful to not change the device state.
PIX has privileged access to the DirectX driver, so I wouldn't expect to be able to use that as a model to emulate.
If an the target app is running in windowed mode, hooking DirectX still work, but you could also just use GDI instead.
Edit: I think this is the link I was originally thinking of.