For instance, you have an application which processes files that are sent by different clients. The clients send tons of files everyday and you load the content of those files into your system. The files have the same format. The only constraint that you are given is you are not allowed to run the same file twice.
In order to check if you ran a particular file is to create a checksum of the file and store it in another file. So when you get a new file, you can create the checksum of that file and compare against the checksums of others files that you have run and stored.
Now, the file that contains all the checksums of all the files that you have run so far is getting really, really huge. Searching and comparing is taking too much time. Users are getting crazy when they click a button and don't see the server's response in a few seconds.
NOTE: The application uses flat files as its database. Please do not suggest to use rdbms or the like. It is simply not possible at the moment.
Do you think there could be another way to check the duplicate files?