I'm trying to filter names out of text blobs. Currently I'm just generating a words list and filtering it by hand but I've got ~8k words to go so I'm looking for a better way. I could grab a dictionary and filter them out but that would cull names like smith and cliff.
What I need is either of the following:
- a list of common names (I'd need the >5k most common names)
- a list of names that also happen to be words
I figure between them, I can do a combined blacklist/whitelist to get what I need.