natural-language

Elegant command-parsing in an OOP-based text game

I'm playing with writing a MUD/text adventure (please don't laugh) in Ruby. Can anyone give me any pointers towards an elegant, oop-based solution to parsing input text? We're talking about nothing more complex than "put wand on table", here. But everything needs to be soft; I want to extend the command set painlessly, later. M...

translate by replacing words inside existing text

What are common approaches for translating certain words (or expressions) inside a given text, when the text must be reconstructed (with punctuations and everythin.) ? The translation comes from a lookup table, and covers words, collocations, and emoticons like L33t, CUL8R, :-), etc. Simple string search-and-replace is not enough since...

Machine Learning and Natural Language Processing

Assume you know a student who wants to study Machine Learning and Natural Language Processing. What introductory subjects would you recommend? Example: I'm guessing that knowing Prolog and Matlab might help him. He also might want to study Discrete Structures*, Calculus, and Statistics. *Graphs and trees. Functions: properties, recur...

How to automatically determine text quality?

A lot of Natural Language Processing (NLP) algorithms and libraries have a hard time working with random texts from the web, usually because they are presupposing clean, articulate writing. I can understand why that would be easier than parsing YouTube comments. My question is: given a random piece of text, is there a process to determi...

Inter-rater agreement (Fleiss' Kappa, Krippendorff's Alpha etc) Java API?

I am working on building a Question Classification/Answering corpus as a part of my masters thesis. I'm looking at evaluating my expected answer type taxonomy with respect to inter-rater agreement/reliability, and I was wondering: Does anybody know of any decent (preferably free) Java API(s) that can do this? I'm reasonably certain all ...

Best Java Open Source Text Mining Framework

Hello Everyone, I want to know what is the best open source java based framework for Text Mining, to use botg Machine Learning and dictionary Methods. I'm using Mallet but there are not that much documentation and I do not know if it will fit all my requirements. Thanks in advance. Best Regards, ukrania ...

How to create endless stories for telenovela or tv-series programmatically?

Do you know any (web) sources that describe ways to continually produce stories from an initial base of parameters/objects and some given relationships? I'm interested in - theory and algorithms - real projects where it was done - how to measure redundance in such a system - (fun) sites with examples of something comparable ...

Looking for a database of n-grams taken from wikipedia

I am effectively trying to solve the same problem as this question: http://stackoverflow.com/questions/610399/finding-related-words-specifically-physical-objects-to-a-specific-word minus the requirement that words represent physical objects. The answers and edited question seem to indicate that a good start is building a list of frequ...

Text similarity algorithm

I have two subtitles files. I need a function that tells whether they represent the same text, or the similar text Sometimes there are comments like "The wind is blowing... the music is playing" in one file only. But 80% percent of the contents will be the same. The function must return TRUE (files represent the same text). And sometime...

How do I tell what language is a plain-text file written in ?

Suppose we have a text file with the content: "Je suis un beau homme ..." another with: "I am a brave man" the third with a text in German: "Guten morgen. Wie geht's ?" How do we write a function that would tell us: with such a probability the text in the first file is in English, in the second we have French etc? Links to books / ou...

How to make concept representation with the help of bag of words

Hi All, Thanks for stoping to read my question :) this is very sweet place full of GREAT peoples ! I have a question about "creating sentences with words". NO NO it is not about english grammar :) Let me explain, If I have bag of words like "person apple apple person person a eat person will apple eat hungry apple hungry" and it can...

Natural Language Processing Package

I have started working on a project which requires Natural Language Processing. We have do the spell checking as well as mapping sentences to phrases and their synonyms. I first thought of using GATE but i am confused on what to use? I found an interesting post here which got me even more confused. http://lordpimpington.com/codespeaks/...

Classification of relationships in words?

Hi, I'm not sure whats the best algorithm to use for the classification of relationships in words. For example in the case of a sentence such as "The yellow sun" there is a relationship between yellow and sun. THe machine learning techniques I have considered so far are Baynesian Statistics, Rough Sets, Fuzzy Logic, Hidden markov model ...

Dependency parsing

Hi I particularly like the transduce feature offered by agfl in their EP4IR http://www.agfl.cs.ru.nl/EP4IR/english.html The download page is here: http://www.agfl.cs.ru.nl/download.html Is there any way i can make use of this in a c# program? Do I need to convert classes to c#? Thanks :) ...

NLP - Word Alignment

I am looking for word alignment tools and algorithms. I am dealing with bilingual English - Hindi text, and currently working on DTW (Dynamic Time Warping) algorithm CLA (Competitive Linking Algorithm) NATools Giza++ Could you please suggest any other algorithm/tool which is language independent and which could achieve Statistical w...

Data clean up: are there libraries of common permutations that we can use? Or is there a better approach?

We are working on clean-up and analysis of a lot of human-entered customer data. We need to decide programmatically whether 2 addresses (for example) are the same, even though the data was entered with slight variations. Right now we run each address through fairly simplistic string replacement (replacing avenue with ave, for example)...

Mapping words to numbers with respect to definition

As part of a larger project, I need to read in text and represent each word as a number. For example, if the program reads in "Every good boy deserves fruit", then I would get a table that converts 'every' to '1742', 'good' to '977513', etc. Now, obviously I can just use a hashing algorithm to get these numbers. However, it would be mo...

Recognizing language of a short text? - Python

Hi folks, I'm have a list of articles, each article has its own title and description. Unfortunately, from the sources I am using, there is no way to know what language they are written. Also, text is not entirely written in 1 language; almost always English words are present. I reckon I would need dictionary databases stored on my...

Defining the context of a word - Python

Hi folks, I think this is an interesting question, at least for me. I have a list of words, let's say: photo, free, search, image, css3, css, tutorials, webdesign, tutorial, google, china, censorship, politics, internet and I have a list of contexts: Programming World news Technology Web Design I need to try and match wo...

Hierarchy of meaning

I am looking for a method to build a hierarchy of words. Background: I am a "amateur" natural language processing enthusiast and right now one of the problems that I am interested in is determining the hierarchy of word semantics from a group of words. For example, if I have the set which contains a "super" representation of others, i...