There are a number of programs that compare a set of files in general to find identical ones. FDUPES is a good one: Link. A million files shoudln't be a a problem, depending on the exact nature of the input. I think that FDUPES requires Linux, but there are other such programs for other platforms.
I tried to write a faster program myself, but except for special cases, FDUPES was faster.
Anyway, the general idea is to start by checking the sizes of the files. Files that have different sizes can't be equal, so you only need to look at groups of files with the same size. Then it gets more complicated if you want optimal performance: If the files are likely to be different, you should compare small parts of the files, in the hope of finding differences early, so you don't have to read the rest of them. If the files are likely to be identical, though, it can be faster to read through each file to calculate a checksum, because then you can read sequentially from the disk instead of jumping back and forth between two or more files. (This assumes normal disks, so SSD:s may be different.)
In my benchmarks when trying to make a faster program it (somewhat to my surprise) turned out to be faster to first read through each file to calculate a checksum, and then if the checksums were equal, compare the files directly by reading a blocks alternately from each file, than to just read blocks alternately without the previous checksum calculations! It turned out that when calculating the checksums, Linux cached both files in main memory, reading each file sequentially, and the second reads were then very fast. When starting with alternating reads, the files were not (physically) read sequentially.
EDIT:
Some people have expressed surprise end even doubt that it could be faster to read the files twice than reading them just once. Perhaps I didn't manage to explain very clearly what I was doing. I am talking about cache pre-loading, in order to have the files in disk cache when later accessing them in a way that would be slow to do on the physical disk drive. Here is a web page where I have tried to explain more in detail, with pictures, C code and measurements.
However, this has (at best) marginal relevance to the original question.