views:

78

answers:

1

Hello all,

For use in a language-learning web application, do you know of data structures and underlying database schema/ layout that would allow efficient storage, processing and querying of sentences, verbs, nouns etc. for different natural languages? For example I would like to store each verb only once and link sentences to a verb object etc.

I came across concrete syntax trees and I am thinking of use an abstract Node class and derive Noun class from it etc. Would a syntax tree structure be too restrictive?

I realise this is quite a broad question and I do not expect you to do my 'homework' but if you could point me to any resources you know of that may help me get started that would be greatly appreciated.

Thank you

Martijn

+1  A: 

Your example looks pretty solid in terms of natural language/sentences manipulation.

About other options.. for text search/storage, you might take a look at Patricia tree. There's implementation of it in Java on Google code.

Also, did you consider using one of existing solutions, like Hunspell, Lucene or Sphinx?

jimmy_keen
@jimmy_keen, thanks! I will have a look at these links. Looks very promising.
martijn_himself