views:

415

answers:

1

Hi. I'm doing a project in java in which I have to process a wikipedia dump file. I'm looking for a library to extract keywords in wikipedia articles... Basically I want to read every tag page in the wikipedia xml dump and compare it with a list of topics and categories and if it is correct , to choose it and add to my results. I'm not interested in read the dump or write wikipedia results, only I want to know about any library that let me to search by topics in the titles and text of a wikipedia article... For example... If the input is "dog" i want the wikipedia articles about dog and if is possible any page under dogs categories.

It doesn't matter if a library for general purpose and not is specified for wikipedia. I need to put the wikitext as argument and received a list of keywords, including categories... I've found some wikipedia libraries that works fine like Wikipedia-Miner or the Java Wikipedia Library but with the first I need to have installed mysql and I want to analyze the text without saving it into a database.

Any kind of help or suggestion is well-received. :)

+1  A: 

It looks like this is your best bet: Java Wikipedia Library

chotchki