Hello,
I have a file containing roughly all the words in English (~60k words, ~500k characters). I want to test whether a certain word I receive as input is "in English" (i.e. if this exact word is in the list).
What would be the most efficient way to do this in Python?
The trivial solution is to load the file into a list and check whether the word is in that list. The list can be sorted, which I believe will shrink the complexity to O(logn). However I'm not sure about how Python implements searching through lists, and whether there's a performance penalty if such a large list is in memory. Can I "abuse" the fact I can put a cap on the length of words? (e.g. say the longest one is 15 characters long).
Please note I run the application on a machine with lots of memory, so I care less for memory consumption than for speed and CPU utilization.
Thanks