views:

319

answers:

4

As I know, looking for a problem to solve (debugging, thinking up a theme for an article, whatever) is the most creative, interesting and difficult part of any problem-solving work. Or just the most difficult.

But I have no idea what's going on in programming-related linguistics. I love languages and simple-for-babies-but-neither-understandable-nor-programmable grammatical structures, but I have no idea what to do with such love.

So, is there any interesting linguistic-related projects or problems on the bleeding edge?

(I've checked the site, there is some questions about "most interesting projects" in some areas, so I hope mine has the right to exists)

+1  A: 
  • Random Names Generator (for rpgs, roguelikes, etc.)
  • Translator (the most difficult)
  • Localizator (module that allows people to translate their software into different language, something like gettext in GNU)
  • Perhaps some chatter bot..?
PiotrK
language interpreter, maze solver.. and there are so many projects!
thephpdeveloper
maze solver and linguistics? please, tell me more!
valya
+3  A: 

Jurafsky & Martin, Speech and Language Processing, is a great (and the standard) introductory textbook on NLP, but don't let that fool you, and it has many good problems to solve.

Also check out http://www.nltk.org/Home which is a good open-source Python project which implements many NLP-problems and algorithms. They also have their own "idea-site" at http://www.nltk.org/projects which should be well worth a read.

If you are looking to roll stuff on your own, one idea is to experiment with Prolog and write up some sentence rules etc. Its fun and educational and not too hard. http://www.csupomona.edu/~jrfisher/www/prolog%5Ftutorial/contents.html

johanbev
Oh, many thanks! For both Prolog and link to nltk. Maybe I'll thank you for the book, but it's only going to be read :)
valya
+1  A: 

First NLP is a very large area and there are many valid subdisciplines. They include syntactic and semantic interpretation of existing text, text generation, document classification, theory of language, translation, etc. You should decide whether you wish to research in this area (e.g. create new methods or resources) or apply known resources to subdomains of interest. There are many conferences (e.g. organised by the The Association for Computational Linguistics (ACL)) and their web page will give you an idea of the range.

To give an idea we work in a subdomain of Natural Language Processing (scientific discourse) which involves (at least):

  1. Entity recognition (e.g. proper names, technical terms - in our case chemistry). We compile dictionaries (ontologies) of specific terms with their meaning
  2. Shallow parsing. Adding part-of-speech tagging (POS) to tokens in sentences and trying to chunk the result into phrases.
  3. Recognition of entities and phrases through machine-learning (e.g. maximum entropy)
  4. Similarity of documents through co-occurrence of terms.
  5. Disambiguation.

and several others areas.

The NLTK gives a good set of tools to experiment with. But be warned that what looks simple often isn't. Natural language cannot be parsed like a formal language.

If you love language then it's an interesting area though hard work.

peter.murray.rust
+1  A: 

I think one of the mose interesting areas right now is unsuperviser parsing. There are lots of reasons to parse. Part of Speech Tagging, Named Entitiy Recognition, Text Summarization to name just a few.

Most of the techniques that perform well use a training set of data that has already been tagged to "train" a model and then this model can be used to tag/parse other data. However this is a brittle, domain dependent technique. There are many unsupervised parsing techniques our there but, to date, they haven't performed as well. That is just beginning to change now.

Scott Frye
Thanks! I'll take a look
valya