For all intents and purposes, I believe that the output cache is completely in-memory - meaning that if the app pool is recycled, the image will need to be generated again.
I've had to do something similar in the past, and I actually implemented a two-tiered system that used the HTTP cache primarily, and used the filesystem as a fallback. If something didn't exist, I generated the image and saved it to disk AND put it in the cache. That way if it gets pushed out of the cache or the app pool recycles, I only have to load it off the disk (it appears you've done the same).
As far as "too much memory", if you explicitly use HttpContext.Cache instead of [OutputCache], you can control the priority of the item in the cache. You can then tweak the settings on your app pool to control how much memory it uses overall, but I'm not sure there's a whole lot to be done other than that. A couple images * 12 products doesn't seem like it would take up a whole lot of memory to me though.
Without knowing anything else about your application, it sounds to me like you could get away with just using the outputcache. However, if you need something more robust and scalable, I'd use the two-tiered system I described. Though, if you've already got that implemented and working, "if it ain't broke..."