linguistics

Training Hidden Markov Models without Tagged Corpus Data

For a linguistics course we implemented Part of Speech (POS) tagging using a hidden markov model, where the hidden variables were the parts of speech. We trained the system on some tagged data, and then tested it and compared our results with the gold data. Would it have been possible to train the HMM without the tagged training set? ...

Ruby Linguistics gem

I try to convert number to words but I have a problem: >> (91.80).en.numwords => "ninety-one point eight" I want it to be "ninety-one point eighty". I use Linguistics gem. Do you know some solution for it (prefer with Linguistics). ...

How can I get the possessive form of a noun?

Here's an algorithm for adding an apostrophe to a given input noun. How would you contruct a string to show ownership? /** * apostrophizes the string properly * <pre> * curtis = curtis' * shaun = shaun's * </pre> * * @param input string to apostrophize * @return apostrophized string or empty string if the input was empty or nul...

What a single sentence consist of? How to name it?

Hi, I'm designing architecture of a text parser. Example sentence: Content here, content here. Whole sentence is a... sentence, that's obvious. The, quick etc are words; , and . are punctuation marks. But what are words and punctuation marks all together in general? Are they just symbols? I simply don't know how to name what a singl...

Part of Speech Tagging - where to start?

Hello I would like to know how to implement the solution to such a task: There's a 500Mb file of plain English texts. I'd like to collect the statistics about the frequency of words, but additionally to be sure that each word is recognized correctly (or the majority of words). In terms that 'cry' in the sentence "she gave a loud CRY" ...

Build a natural language model that fixes misspellings.

What are books about how to build a natural language parsing program like this: input: I got to TALL you output: I got to TELL you input: Big RAT box output: Big RED box in: hoo un thum zend three out: one thousand three It must have the language model that allows to predict what words are misspelled ! What are the best books on ...

Monitor brands with common words

Let's say you should monitor the brand "ONE" online. What algorithms can be used to separate pages about the brand ONE from pages containing the common word ONE? I'm thinking maybe Bayes could work, but are there other ways to do this? ...

Algorithm to choose random letters for word search game that allows many words to be spelled

I'm making a boggle-like word game. The user is given a grid of letters like this: O V Z W X S T A C K Y R F L Q The user picks out a word using any adjacent chains of letters, like the word "STACK" across the middle line. The letters used are then replaced by the machine e.g. (new letters in lowercase): O V Z W X z e x o p Y R F L Q...

How to make concept representation with the help of bag of words

Hi All, Thanks for stoping to read my question :) this is very sweet place full of GREAT peoples ! I have a question about "creating sentences with words". NO NO it is not about english grammar :) Let me explain, If I have bag of words like "person apple apple person person a eat person will apple eat hungry apple hungry" and it can...

Classification of relationships in words?

Hi, I'm not sure whats the best algorithm to use for the classification of relationships in words. For example in the case of a sentence such as "The yellow sun" there is a relationship between yellow and sun. THe machine learning techniques I have considered so far are Baynesian Statistics, Rough Sets, Fuzzy Logic, Hidden markov model ...

NLP - Word Alignment

I am looking for word alignment tools and algorithms. I am dealing with bilingual English - Hindi text, and currently working on DTW (Dynamic Time Warping) algorithm CLA (Competitive Linking Algorithm) NATools Giza++ Could you please suggest any other algorithm/tool which is language independent and which could achieve Statistical w...

How to conjugate English words in Java?

Hello. Say I have a base form of a word and a tag from the Penn Treebank Tag Set. How can I get the conjugated form? For example for "do" and "VBN" how can I get "done"? I thinks this task is already implemented in some nlp library, so I'd rather not invent the bicycle. Does something like that exist? ...

Algorithm for Negating Sentences

I was wondering if anyone was familiar with any attempts at algorithmic sentence negation. For example, given a sentence like "This book is good" provide any number of alternative sentences meaning the opposite like "This book is not good" or even "This book is bad". Obviously, accomplishing this with a high degree of accuracy would pr...

Where can I find a list of English phrases?

I'm tasked with searching for the use of cliches and common phrases in text. The phrases are similar to the phrases you might see for the phrase puzzles on Wheel of Fortune. Here are a few examples: Easy Come Easy Go Too Good To be True Winning Isn't Everything I cannot find a list of phrases however. Does anybody know of such a list...

How to get logical parts of a sentence with java?

Hello. Let's say there is a sentence: On March 1, he was born. Changing it to He was born on March 1. doesn't break the sense of the sentence and it is still valid. Shuffling words in any other way would produce weird to invalid sentences. So basically, I'm talking about parts of the sentence, which make the information more speci...

RDF of sentences

Hi, I need to classify sentences as a RDF format. In other words "John likes coke" would be automatically represented as Subject : John Predicate : Likes Object : Coke does nyone know where I should start? Are there any programs which can do this automatically or would I need to do everything from scratch? Any help would be appreci...

Resources for character and text processing (encoding, regular expressions, NLP)

I'd like to learn foundations of encodings, characters and text. Understanding these is important for dealing with a large set of text whether that are log files or text source for building algorithms for collective intelligence. My current knowledge is pretty basic: something like "As long as I use UTF-8, I'm okay." I don't say I need ...

get correct word from wrong word php

hi i want to know how to get correct word from wrong one... example The string is "sstring" but the correct word is string... is any algorithm in php? thanks and advance ...

Natural language grammar and user-entered names

Some languages, particularly Slavic languages, change the endings of people's names according to the grammatical context. (For those of you who know grammar or studied languages that do this to words, such as German or Russian, and to help with search keywords, I'm talking about noun declension.) This is probably easiest with a set of e...

Dual-line bilingual paragraph in LaTeX

An interlinear gloss can be used to layout a translation of a document. http://en.wikipedia.org/wiki/Interlinear_gloss Usually this is done word-by-word or morpheme-by-morpheme. However, I would like to do this in a different way, translating entire paragraphs at a time. The following link and image is an example of what I want done,...