I have two very large files (and neither of them would fit in memory). Each file has one string (which doesn't have spaces in it and is either 99/100/101 characters long) on each line.
Update: The strings are not in any sorted order.
Update2: I am working with Java on Windows.
Now I want to figure out the best way to find out all the strings that occur in both the files.
I have been thinking about using external merge sort to sort both the files and then do comparison but I am not sure if that would be the best way to do it. Since the strings are mostly around the same length, I was always wondering if computing some kind of a hash for each string would be a good idea, since that should make comparisons between strings easier, but then that would mean I have to store the hashes computed for the strings I have encountered from the files so far so that they can be used later when comparing them with other strings. I am not able to pin down on what exactly would be the best way. I am looking for your suggestions.
When you suggest a solution, also please state if the solution would work if there were more than 2 files and strings which occur in all of them had to be figured out.