nlp

Searching text for geonames

Hi, which part of huge package nltk I must study and use, if I need mark geonames in text? ...

Starting out NLP - Python + large data set

Hi, I've been wanting to learn python and do some NLP, so have finally gotten round to starting. Downloaded the english wikipedia mirror for a nice chunky dataset to start on, and have been playing around a bit, at this stage just getting some of it into a sqlite db (havent worked with dbs in the past unfort). But I'm guessing sqlite ...

Natural language processing - Ideas for beginner's projects

Hi guys, I am a beginner in NLP and NLTK. I am very interested in NLP and hence joined a weekend course on AI in some local institution, which requires me to do a project for completion of the course, and I decided to do it in NLP. The problem is,the instructor is not good at all for this course (According to me she is just a charlatan)...

Dictionary of English Words for a J2ME app

I intend to develop a J2ME application, that should be able to read words from the English Dictionary. How do I interface to/and store a Dictionary ? Will I have to create the Dictionary myself, by inserting words, or is there a third party Dictionary available with APIs? ...

are there any c# libraries for Named Entity Recognition?

I am looking for any free libraries for Named Entity Recognition in c# or any other .net language. ...

Natural Language parsing of an appointment?

I'm looking for a Java library to help parse user entered text that represents an 'appointment' for a calendar application. For instance: Lunch with Mike at 11:30 on Tuesday or 5pm Happy hour on Friday I've found some promising leads like https://jchronic.dev.java.net/ and http://www.datejs.com/ which can parse dates - but I also n...

Building dictionary of words from large text

I have a text file containing posts in English/Italian. I would like to read the posts into a data matrix so that each row represents a post and each column a word. The cells in the matrix are the counts of how many times each word appears in the post. The dictionary should consist of all the words in the whole file or a non exhaustive E...

calling Stanford POS Tagger maxentTagger from java program

Hi. I am new to Stanford POS tagger. I need to call the Tagger from my jva program and direct the output to a text file. I have extracted the source files from Stanford-postagger and tried calling the maxentTagger, but all I find is errors and warnings. Can somebody tell me from the scratch about how to call maxentTagger in my program, ...

How to conjugate English words in Java?

Hello. Say I have a base form of a word and a tag from the Penn Treebank Tag Set. How can I get the conjugated form? For example for "do" and "VBN" how can I get "done"? I thinks this task is already implemented in some nlp library, so I'd rather not invent the bicycle. Does something like that exist? ...

Algorithm for Negating Sentences

I was wondering if anyone was familiar with any attempts at algorithmic sentence negation. For example, given a sentence like "This book is good" provide any number of alternative sentences meaning the opposite like "This book is not good" or even "This book is bad". Obviously, accomplishing this with a high degree of accuracy would pr...

Which is better? OpenCyc or ConceptNet?

Hi, I'm doing a NLP project where I need to recognise concepts in sentences to find other similar concepts. I do this to infer word valences from a list I already have. I started using WordNet, but it gave many contradictory results. By contradictory results I mean word expansions that had contradictory valences. So now I'm looking into...

How to perform FST (Finite State Transducer) composition

Consider the following FSTs : T1 0 1 a : b 0 2 b : b 2 3 b : b 0 0 a : a 1 3 b : a T2 0 1 b : a 1 2 b : a 1 1 a : d 1 2 a : c How do I perform the composition operation on these two FSTs (i.e. T1 o T2) I saw some algorithms but couldn't understand much. If anyone could explain it in a easy way it would be a major help. Please not...

tag generation from a text content

Hello, I am curious if there is an algorithm/method exists to generate keywords/tags from a given text, by using some weight calculations, occurrence ratio or other tools. Additionally, I will be grateful if you point any Python based solution / library for this. Thanks ...

English dictionary as txt or xml file with support of synonyms

Can someone point me to where I can download English dictionary as a txt or xml file. I am building a simple app for myself and looking for something what I could start using immediately without learning complex API. Support for synonyms would be great, that is it should be easier to retrieve all the synonyms for particular word. It wo...

Off the shelf discriminative reranking software

Is there existing software for discriminative reranking, such as that used by the Charniak NLP parser, Shen, Sarkar, and Och's parser or Shen and Joshi's techniques? I'd like something that I can easily adapt for my own uses, which are similar to parse reranking. ...

details on the following Natural Language Processing terms ?

Named Entity Extraction (extract ppl, cities, organizations) Content Tagging (extract topic tags by scanning doc) Structured Data Extraction Topic Categorization (taxonomy classification by scanning doc....bayesian ) Text extraction (HTML page cleaning) are there libraries that i can use to do any of the above functions of NLP ? dont ...

get sentence splitter annotation set offsets through GATE API

I am using GATE and i am using ANNIE sentence splitter. I would like to get each sentence's start and end offseet through the GATE API. Does anyone know how can i have access to these annotation set? Sorry for the poor grammar, thanks. ...

Looking for a good semantic parser for the Russian language.

Does anyone known of a semantic parser for the Russian language? I've attempted to configure the link-parser available from link-grammar site but to no avail. I'm hoping for a system that can run on the Mac and generate either a prolog or lisp-like representation of the parse tree (but XML output is fine as well). Thank you kindly in ad...

I want a machine to learn to categorize short texts

Hello, I have a ton of short stories about 500 words long and I want to categorize them into one of, let's say, 20 categories: Entertainment Food Music etc I can hand-classify a bunch of them, but I want to implement machine learning to guess the categories eventually. What's the best way to approach this? Is there a standard appro...

How to get logical parts of a sentence with java?

Hello. Let's say there is a sentence: On March 1, he was born. Changing it to He was born on March 1. doesn't break the sense of the sentence and it is still valid. Shuffling words in any other way would produce weird to invalid sentences. So basically, I'm talking about parts of the sentence, which make the information more speci...