views:

141

answers:

1

I'm trying to filter names out of text blobs. Currently I'm just generating a words list and filtering it by hand but I've got ~8k words to go so I'm looking for a better way. I could grab a dictionary and filter them out but that would cull names like smith and cliff.

What I need is either of the following:

  • a list of common names (I'd need the >5k most common names)
  • a list of names that also happen to be words

I figure between them, I can do a combined blacklist/whitelist to get what I need.

+2  A: 

US Census name list: http://www.census.gov/genealogy/names/

That should get you one angle on the problem, anyway.

fennec
That should do it.
BCS