Are there any good APIs and public datasets (dictionaries, phrases) for working w/ natural languages?
Specifically, do any good ones exist for working on translation between English and Korean?
Are there any good APIs and public datasets (dictionaries, phrases) for working w/ natural languages?
Specifically, do any good ones exist for working on translation between English and Korean?
For English I use OpenNLP.
Unfortunately, I've never saw anything Korean-related, except Google Language Detection and Translation APIs. They're quite easy to use.
WordNet is a classic data resource for English, with semantic relationships.
MontyLingua might come in handy for an intermediate layer between English and Korean.
The Natural Language Toolkit (NLTK) is an excellent resource if you're considering Python as a language. It incorporates lots of the stuff you'd expect in a text processing/NLP environment like parsers, stemmers and part-of-speech tagging. Documentation on it is pretty good too.
As for datasets, NLTK comes with a variety of annotated corpora and textual data sets for experimenting with.
Hope it helps, B.