nlp

How to analyze simple English sentences

Is there any library that can be used for analyzing (nlp) simple english text. For example it would be perfect if it can do that; Input: "I am going" Output: I, go, present continuous tense ...

Pulling stats out of a text

I'd like to know what are the most recurrent in a given text or group of text (pulled from a database) in ruby. Does anyone know what are the best practices? ...

TDD and the Bayesian Spam Filter problem

It's well known that Bayesian classifiers are an effective way to filter spam. These can be fairly concise (our one is only a few hundred LoC) but all core code needs to be written up-front before you get any results at all. However, the TDD approach mandates that only the minimum amount of code to pass a test can be written, so given t...

The lines that stand out in a file, but aren't exact duplicates

I'm combing a webapp's log file for statements that stand out. Most of the lines are similar and uninteresting. I'd pass them through Unix uniq, however that filters nothing, as all the lines are slightly different: they all have a different timestamp, similar statements might print a different user ID, etc. What's a way and/or tool to...

How do I do word Stemming or Lemmatization?

I've tried PorterStemmer and Snowball but both don't work on all words, missing some very common ones. My test words are: "cats running ran cactus cactuses community communities", and both get less than half right. Ideally the class/function would be in PHP, but I can port it if it's in another language. See also: Stemming algorith...

How to turn plural words singular?

I'm preparing some table names for an ORM, and I want to turn plural table names into single entity names. My only problem is finding an algorithm that does it reliably. Here's what I'm doing right now: If a word ends with -ies, I replace the ending with -y If a word ends with -es, I remove this ending. This doesn't always work however...

Algorithm for analyzing text of words

I want an algorithm which would create all possible phrases in a block of text. For example, in the text: "My username is click upvote. I have 4k rep on stackoverflow" It would create the following combinations: "My username" "My Username is" "username is click" "is click" "is click upvote" "click upvote" "i have" "i have 4k" "have 4...

How do you parse a paragraph of text into sentences? (perferrably in Ruby)

How do you take paragraph or large amount of text and break it into sentences (perferably using Ruby) taking into account cases such as Mr. and Dr. and U.S.A? (Assuming you just put the sentences into an array of arrays) UPDATE: One possible solution I thought of involves using a parts-of-speech tagger (POST) and a classifier to determ...

Java : Is there a good natural language processing library

I need to implement some NLP in my current module. I am looking for some good library that can help me here. I came across 'LingPipe' but could not completely follow on how to use it. Basically, we need to implement a feature where the application can decipher customer instructions (delivery instructions) typed in plain english. Eg: Wi...

Stackoverflow Related questions algorithm

The related questions that appear after entering the title, and those that are in the right side bar when viewing a question seem to suggest very apt questions. Stackoverflow only does a SQL search for it and uses no special algorithms, said Spolsky in a talk. What algorithms exist to give good answers in such a case. How do U do datab...

Which NLP toolkit to use in JAVA ?

Hello there, i'm working on a project that consists of a website that connects to the NCBI(National Center for Biotechnology Information) and searches for articles there. Thing is that I have to do some text mining on all the results. I'm using the JAVA language for textmining and AJAX with ICEFACES for the development of the website. ...

Is it possible to guess a user's mood based on the structure of text?

I assume a natural language processor would need to be used to parse the text itself, but what suggestions do you have for an algorithm to detect a user's mood based on text that they have written? I doubt it would be very accurate, but I'm still interested nonetheless. EDIT: I am by no means an expert on linguistics or natural language...

English translation of the STTS tagset

The most common part-of-speech tagset for German is the STTS tagset. I need an English translation of the explanations for each tag. Not being a linguist I don't feel comfortable (let alone qualified) for translating this myself. Google turned up nothing, so any help is appreciated. ...

Extract small relevant bits text (as Google does) from the full text search results.

I have implemented a full text search in a discussion forum database and I want to display the search results in a way Google does. Even for a very long html page only a two or three lines of the texts displayed in a search result list. Usually these are the lines which contain a search terms. What would be the good algorithm of how to...

Produce a sentence from a grammar with a given number of terminals

Say you've got a toy grammar, like: (updated so the output looks more natural) S -> ${NP} ${VP} | ${S} and ${S} | ${S}, after which ${S} NP -> the ${N} | the ${A} ${N} | the ${A} ${A} ${N} VP -> ${V} ${NP} N -> dog | fish | bird | wizard V -> kicks | meets | marries A -> red | striped | spotted e.g., "the dog kicks the red wizard...

Parsing text into sentences?

I am trying to parse text off of a PDF page into sentences but it is much more difficult than I had anticipated. There are a whole lot of special cases to consider such as initials, decimals, quotations, etc which contain periods but do not necessarily end the sentence. I was curious if anyone here was familiar with an NLP library for ...

Finding bigram in a location index

I have a table which indexes the locations of words in a bunch of documents. I want to identify the most common bigrams in the set. How would you do this in MSSQL 2008? the table has the following structure: LocationID -> DocID -> WordID -> Location I have thought about trying to do some kind of complicated join... and i'm just doing...

Natural Language Processing in Ruby

I'm looking to do some sentence analysis (mostly for twitter apps) and infer some general characteristics. Are there any good natural language processing libraries for this sort of thing in Ruby? Similar to http://stackoverflow.com/questions/870460/java-is-there-a-good-natural-language-processing-library but for Ruby. I'd prefer somethi...

Is there a natural language parser for date/times in javascript?

Is there a natural language parser for date/times in javascript? ...

Is there a natural language parser for dates/times in ColdFusion?

Is there a natural language parser for date/times in ColdFusion? ...