nlp

How to get parent node in Stanford's JavaNLP?

Hello. Suppose I have such chunk of a sentence: (NP (NP (DT A) (JJ single) (NN page)) (PP (IN in) (NP (DT a) (NN wiki) (NN website)))) At a certain moment of time I have a reference to (JJ single) and I want to get the NP node binding A single page. If I get it right, that NP is the parent of the node, A and page a...

RDF of sentences

Hi, I need to classify sentences as a RDF format. In other words "John likes coke" would be automatically represented as Subject : John Predicate : Likes Object : Coke does nyone know where I should start? Are there any programs which can do this automatically or would I need to do everything from scratch? Any help would be appreci...

Stanford Parser - Traversing the typed dependencies graph

Hello! Basically I want to find a path between two NP tokens in the dependencies graph. However, I can't seem to find a good way to do this in the Stanford Parser. Any help? Thank You Very Much ...

How to estimate the quality of a web page?

Hello, I'm doing a university project, that must gather and combine data on a user provided topic. The problem I've encountered is that Google search results for many terms are polluted with low quality autogenerated pages and if I use them, I can end up with wrong facts. How is it possible to estimate the quality/trustworthiness of a pa...

entity set expansion python

Do you know of any existing implementation in any language (preferably python) of any entity set expansion algorithms, such that the one from Google sets ? ( http://labs.google.com/sets ) I couldn't find any library implementing such algorithms and I'd like to play with some of those to see how they would perform on some specific task I...

tag generation from a small text content (such as tweets)

Hello, I have already asked a similar question earlier but I have notcied that I have big constrain: I am working on small text sets suchs as user Tweets to generate tags(keywords). And it seems like the accepted suggestion ( point-wise mutual information algorithm) is meant to work on bigger documents. With this constrain(working on ...

Naive Bayesian for Topic detection using "Bag of Words" approach

I am trying to implement a naive bayseian approach to find the topic of a given document or stream of words. Is there are Naive Bayesian approach that i might be able to look up for this ? Also, i am trying to improve my dictionary as i go along. Initially, i have a bunch of words that map to a topics (hard-coded). Depending on the occ...

Text mining with PHP

Hi, I'm doing a project for a college class I'm taking. I'm using PHP to build a simple web app that classify tweets as "positive" (or happy) and "negative" (or sad) based on a set of dictionaries. The algorithm I'm thinking of right now is Naive Bayes classifier or decision tree. However, I can't find any PHP library that helps me do...

NLP with greatly contrained input and abilities

Hat in hand here. I'm a seasoned developer and I would be grateful for a bit of help. I don't have time to read or digest long intricate discussions on theoretical concepts around NLP (or go get my PHD). That said, I have read a few and it's a damn interesting field. The problem is I need real world solutions, for real world products, in...

latin bases language segmentation gramatical rules

Hi folks, I am working on one feature i.e. to apply language segmentation rules ( grammatical ) for Latin based language ( English currently ). Currently I am in phase of breaking sentences of user input. e.g.: "I am working in language translation". "I have used Google MT API for this" In above example i will break above sentence ...

Java text classification problem

Hello, I have a set of Books objects, classs Book is defined as following : Class Book{ String title; ArrayList<tags> taglist; } Where title is the title of the book, example : Javascript for dummies. and taglist is a list of tags for our example : Javascript, jquery, "web dev", .. As I said a have a set of books talking about di...

Given a document, select a relevant snippet.

When I ask a question here, the tool tips for the question returned by the auto search given the first little bit of the question, but a decent percentage of them don't give any text that is any more useful for understanding the question than the title. Does anyone have an idea about how to make a filter to trim out useless bits of a que...

machine learning and code generator from strings

The problem: Given a set of hand categorized strings (or a set of ordered vectors of strings) generate a categorize function to categorize more input. In my case, that data (or most of it) is not natural language. The question: are there any tools out there that will do that? I'm thinking of some kind of reasonably polished, download, ...

Sentiment analysis with NLTK python for sentences using sample data or webservice?

I am embarking upon a NLP project for sentiment analysis. I have successfully installed NLTK for python (seems like a great piece of software for this). However,I am having trouble understanding how it can be used to accomplish my task. Here is my task: I start with one long piece of data (lets say several hundred tweets on the subje...

Adjective Nominalization in Python NLTK

Hi, Is there a way to obtain Wordnet adjective nominalizations using NLTK? For example, for 'happy' the desired output would be 'happiness'. I tried to dig around, but couldn't find anything. Thanks! ...

Corpus/data set of English words with syllabic stress information?

I know this is a long shot, but does anyone know of a dataset of English words that has stress information by syllable? Something as simple as the following would be fantastic: AARD vark A ble a BOUT ac COUNT AC id ad DIC tion ad VERT ise ment ... Thanks in advance! ...

Java or Python distributed compute job (on a student budget)?

I have a large dataset (c. 40G) that I want to use for some NLP (largely embarrassingly parallel) over a couple of computers in the lab, to which i do not have root access, and only 1G of user space. I experimented with hadoop, but of course this was dead in the water-- the data is stored on an external usb hard drive, and i cant load it...

How to identify ideas and concepts in a given text

I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained: Maybe if you tell me a little more about who Mr Jones is, that would help. It would also be useful if I could have a description of his appearance, ...

Keyword sorting algorithm

I have over 1000 surveys, many of which contains open-ended replies. I would like to be able to 'parse' in all the words and get a ranking of the most used words (disregarding common words) to spot a trend. How can I do this? Is there a program I can use? EDIT If a 3rd party solution is not available, it would be great if we can keep...

Indexing and Searching Over Word Level Annotation Layers in Lucene

I have a data set with multiple layers of annotation over the underlying text, such as part-of-tags, chunks from a shallow parser, name entities, and others from various natural language processing (NLP) tools. For a sentence like The man went to the store, the annotations might look like: Word POS Chunk NER ==== === ===== ...