In general, I don't think this is possible. It works for DRAM and the pagefile since that is an OS managed resource, cache is managed by the CPU itself.
The OS could do a tight timing loop of a memory read and try to see if it completes fast enough to be in the cache or if it had to go out to main memory - this would be very error prone.
On multi-core/multi-proc systems, there are cache coherency protocols that are used between processors to determine when to they need to invalidate each other's caches, I suppose you could have a custom device that would snoop this protocol that the OS would query.
What are you trying to do? If you want to force something into memory, current x86 processors support prefetching memory into the cache in a non-blocking way, for instance with Visual C++ you could use _mm_prefetch
to fetch a line into the cache.
EDIT:
I haven't done this myself, so use at your own risk. To determine cache misses for profiling, you may be able to use some architecture-specific registers. http://download.intel.com/design/processor/manuals/253669.pdf, Appendix A gives "Performance Tuning Events". This can't be used to determine if an individual address is in the cache or when it is loaded in the cache, but can be used for overall stats. I believe this is what vTune (a phenomenal profiler for this level) uses.