tags:

views:

46

answers:

2

hi,

What would be the easiest way to get all articles about people from wikipedia? I know I can download a dump of all the pages, but then how do I filter those and get only the ones about people? I need as many as I can get (preferably more than a million) so using any sort of API is probably not an option.

A: 

If you are going to roll out on your own, basically what you need is to focus is on the "infobox data" in the XML dump.

Reference: http://code.google.com/p/infobox2rdf/

Or you can also checkout the http://www.freebase.com or http://dbpedia.org

tszming