If you can sort your file containing the strings, then reading the sorted list and counting duplicates would be easy. (You can retain the original file and create a new file of sorted strings.) Sorting large files efficiently is old technology. You should be able to find a utility for that.
If you can't sort, then consider digesting the strings. MD5 may be overkill for your purpose. You can cobble something up. For billions of strings, you could use 8 byte digests. Use a tree (probably a BST) of digests. For each digest, store the file offsets of the unique strings that produce that digest.
When you read a string, compute it's digest, and look it up. If you don't find the digest, you know the string is unique. Store it in the tree. If you do find the digest, check each associated string for a match and handle accordingly.
To compare strings, you will need to go to the file, since all you've stored is the file offsets.
What's important to remember it that if two digests are different, the strings that produced them must be different. If the digests are the same, the strings may not be the same, so you need to check. This algorithm will be more efficient when there are fewer duplicate strings.