views:

343

answers:

7

I am trying to find information (and hopefully c# source code) about trying to create a basic AI tool that can understand english words, grammar and context.

The Idea is to train the AI by using as many written documents as possible and then based on these documents, for the AI to create its own creative writitng in proper english that makes sense to a human.

While the idea is simple, I do realise that the hurdles are huge, any starting points or good resoueces will be appriacted.

+3  A: 

A basic AI tool that you can use to do something like this is a Markov Chain. It's actually not too tricky to write!

See: http://pscode.com/vb/scripts/ShowCode.asp?txtCodeId=2031&lngWId=10

If that's not enough, you might be able to store WordNet synsets in your Markov chain instead of just words. This gives you some sense of the meaning of the words.

kibibu
A: 

One thing, though not quite what you need, would be a Markov chain of words. Here's a link I found by a quick search: http://blog.figmentengine.com/2008/10/markov-chain-code.html, but you can find much more information by searching for it.

rslite
+1  A: 

Some good references and reading at this Natural Language article.

nik
+1  A: 

As others said, Markov chain seems to be most suitable for such a task. Nice description of implementing Markov chain can be found in Kernighan & Pike, The Practice of Programming, section 3.1. Nice description of text-generating is also present in Programming Pearls.

piotrsz
A: 

Take a look at http://www.nltk.org/ (Natural Language Toolkit), lots of powerful tools there. They use Python (not C#) but Python is easy enough to pick up. Much easier to pick up than the breadth and depth of natural language processing, at least.

ScottD
+1  A: 

To be able to recompose a document you are going to have to have away to filter through the bad results.

Which means:

  1. You are going to have to write a program that can evaluate if the output is valid (grammatically and syntactically is the best you can do reliablily) (This would would NLP)
  2. You would need lots of training data and test data
  3. You would need to watch out for overtraining (take a look at ROC curves)

Instead of writing a tool you could:

  1. Manually score the output (will take a long time to properly train the algorigthm)
    1. With this using the Amazon Mechanical Turk might be a good idea

The irony of this: The computer would have a difficult time "Creatively" composing something new. All of its worth will be based on its previous experiences [training data]

monksy
A: 

I agree, that you will have troubles in creating something creative. You could possibly also use a keyword spinner on certain words. You might also want to implement a stop word filter to remove anything colloquial.

Laykes