This might not be very useful to you, but you could write your own malloc wrapper. In our special "diagnostic" builds it keeps a table of all outstanding allocations (including the file name and line number where the allocation occurred) and prints out anything that was still outstanding at exit time. It also uses canary words (to check for buffer overflows) and a combination of memory re-writing and block checksumming after free and before reallocation (to check for use-after-free).
If your product is sufficiently large it might be annoying to have to find-replace your entire source, hoping for the best. Also, the development time for your own malloc wrapper is probably not negligible. Doing lots of heavyweight stuff like what I mentioned above probably won't help out your speed problem, either. Writing your own wrapper would allow the most flexibility, though.