Hey guys,
I am looking for a free tagged corpus for a system to train on to for Named Entity Recognition. Most of the ones I find (like the New York Times one) are expensive and not open. Can anyone help?
Hey guys,
I am looking for a free tagged corpus for a system to train on to for Named Entity Recognition. Most of the ones I find (like the New York Times one) are expensive and not open. Can anyone help?
dbPedia is open and free
dbPedia is built from WikiPedia and it is a very big corpus. Build an Lucene index on triples involving rdfs:label
on all dbPedia titles dump.