ansaurus

Question

details on the following Natural Language Processing terms ?

Answer 1

+6 A:

There are actually plenty of freely available open-source natural language processing packages out there. Here's a brief list, organized by what language the toolkit is implemented in:

Python: Natural Language Toolkit NLTK
Java: OpenNLP, Gate, and Stanford's JavaNLP
.NET: Sharp NLP

If you're uncertain which one to go with, I would recommend starting with NLTK. The package is reasonably easy to use and has great documentation online, including a free book.

You should be able to use NLTK to easily accomplish the NLP tasks you've listed, e.g. named entity recognition (NER), extracting tags for documents, and document categorization.

What the Alchemy people call structured data extraction looks like it's just HTML scrapping that is robust against changes to the underlying HTML as long as the page still visually renders the same way. So, it's not really a NLP task.

For the extraction of text from HTML, just use boilerpipe. It's fast, good, and free.

dmcer 2010-04-21 01:34:52

omg this is the answer i was looking for ! YOU SIR ARE AWESOME

wefwgeweg 2010-04-21 06:01:08

If the task at hand is boilerpipe, there's no need to finish an argument about training data.

bmargulies 2010-04-21 11:58:23

Answer 2

A:

The Apache UIMA project was originally created by IBM and provides an NLP framework much like GATE. There are various annotators out there that are built for UIMA.

Thien 2010-04-22 13:32:32

ansaurus

tags:

views:

answers:

details on the following Natural Language Processing terms ?

related questions