nlp

Human name comparison: ways to approach this task

I'm not a Natural Language Programming student, yet I know it's not trivial strcmp(n1,n2). Here's what i've learned so far: comparing Personal Names can't be solved 100% there are ways to achieve certain degree of accuracy. the answer will be locale-specific, that's OK. I'm not looking for spelling alternatives! The assumption is...

N-grams: Explanation + 2 applications

Hello! I want to implement some applications with n-grams (preferably in PHP). Which type of n-grams is more adequate for most purposes? A word level or a character level n-gram? How could you implement an n-gram-tokenizer in PHP? First, I would like to know what N-grams exactly are. Is this correct? It's how I understand n-grams...

Algorithms to recognize misspelled names in texts

I need to develop an application that will index several texts and I need to search for people’s names inside these texts. The problem is that, while a person’s correct name is “Gregory Jackson Junior”, inside the text, the name might me written as: - Greg Jackson Jr - Gegory Jackson Jr - Gregory Jackson - Gregory J. Junior I plan t...

Looking for any free tagged english corpus(es)

Does anyone know of any free (licensed free for commercial use) tagged English corpus(es) that can be used to train a part of speech (POS) tagger? The only ones I have seen online seem to start in the thousands for commercial use. Any help would be appreciated, thanks. ...

Limit CPU / Stack for Java method call?

I am using an NLP library (Stanford NER) that throws OOM errors for rare input documents. I plan to eventually isolate these documents and figure out what about them causes the errors, but this is hard to do (I'm running in Hadoop, so I just know the error occurs 17% through split 379/500 or something like that). As an interim solution,...

Simple Sentiment Analysis

It appears that the simplest, naivest way to do basic sentiment analysis is with a Bayesian classifier (confirmed by what I'm finding here on SO). Any counter-arguments or other suggestions? ...

LingPipe Text Processing API

Hello to everyone. This is a question only to those who already used the LingPipe. My question is how to load up the GENIA corpus for Part of Speech tagging. When I start parsing it I get an error saying that I got out of memory heap. Thnx. ...

getting into sentiment analysis

I've got a requirement of determining whether the entered sentence is positive or negative.... First I thought it is something to do with Social Network analysis and later I realised that it is Sentiment analysis. My first question is what is the difference between these two? I think SNA itself uses SA... plz correct me if i am wrong... ...

Identifying geographical locations in text

What kind of work has been done to determine whether a specific string pertains to a geographical location? For example: 'troy, ny' 'austin, texas' 'hotels in las vegas, nv' I guess what I'm sort of expecting is a statistical approach that gives a degree of confidence that the first two are locations. The last one would probably req...

Different levels in speech recognition software.

There are phonetic level, syntactic level, semantic level, phonological level, acoustic level, linguistic level, language level. Are there any other levels? What's the order from the bottom up? And what are they really about? ...

How to use Wordnet in SQL

How to use Wordnet in SQL database. Does it exists anywhere can someone give me step by step procedure ...

Processing English Statements

Any recommendations for languages/libraries to convert sentence like: "X bumped Y, who in turn kicked Z." to X: Bumped Y: Was bumped, kicked Z ...

How to correct the user input (Kind of google "did you mean?")

I have the following requirement: - I have many (say 1 million) values (names). The user will type a search string. I don't expect the user to spell the names correctly. So, I want to make kind of Google "Did you mean". This will list all the possible values from my datastore. There is a similar but not same question here. This did no...

How can I correctly prefix a word with "a" and "an"?

I have a .NET application where, given a noun, I want it to correctly prefix that word with "a" or "an". How would I do that? Before you think the answer is to simply check if the first letter is a vowel, consider phrases like: an honest mistake a used car ...

How to determine subject, object and other words?

I'm trying to implement application that can determine meaning of sentence, by dividing it to smaller pieces. So I need to know what words are subject, object etc. so that my program can know how to handle this sentence. ...

How to determine subject, object and other words in a Context

Hi, Im trying to implement NLP in my project, I need to Tag the words as Person,Location ,Organix=sation etc..If any body knows the logic please let me know.. Regards, Stack ...

rapidminer and sentiment analysis

Hi, Is anyone out there used Rapidminer for sentiment analysis... Is this a right combination??? If not how do I get started with a simple sentiment analysis application?? ...

How to find Title case phrases from a passage or bunch of paragraphs

How do I parse sentence case phrases from a passage. For example from this passage Conan Doyle said that the character of Holmes was inspired by Dr. Joseph Bell, for whom Doyle had worked as a clerk at the Edinburgh Royal Infirmary. Like Holmes, Bell was noted for drawing large conclusions from the smallest observations.[1] Michael Har...

Natural language processing / text structure analysis starting point

I need to parse & process a big set of semi-structured text (basically, legal documents - law texts, addendums to them, treaties, judge's decisions, ...). The most fundamental thing I'm trying to do is extract information on how subparts are structured - chapters, articles, subheadings, ... plus some metadata. My question is if anyone ca...

the best way to get started with sentiment analysis

Hi Can you pls someone give some starting points on getting started with sentiment analysis it would be great if you can provide some open source tools that can be used for the same.... currently I am looking at GATE (http://gate.ac.uk) and RapidMiner (http://rapid-i.com/) but i think I am in middle of knowhere and I lack the basics...