I have a sorted list of 1,000,000 strings with a maximum length of 256 with protein names. Every string has an associated ID. I have another unsorted list of 4,000,000,000 strings with a maximum length of 256 with words out of articles and every word has an ID.
I want to find all matches between the list of protein names and the list of words of the articles. Which algorithm should I use? Should I use some prebuild API?
It would be good if the algorithm runs on a normal PC without special hardware.
Estimates of time that the algorithm requires would be nice but not obligatory.