nlp

Python - pyparsing unicode characters

Hi..:) I tried using w = Word(printables), but it isn't working. How should I give the spec for this. 'w' is meant to process Hindi characters (UTF-8) The code specifies the grammar and parses accordingly. 671.assess :: अहसास ::2 x=number + "." + src + "::" + w + "::" + number + "." + number If there is only english characters it ...

Transformation-Based Part-of-Speech Tagging(Brill Tagging)

What are the weaknesses and strengths of the Brill Tagger? Can you suggest some possible improvements for the tagger? ...

NLP project, python or C++

We are working on Arabic Natural Language Processing project, we have limited our choices to either write the code in Python or C++ (and Boost library). We are thinking of these points: Python Slower than C++ (There is ongoing work to make Python faster) Better UTF8 support Faster in writing tests and trying different algorithms C++...

Text mining, fact extraction, semantic analysis using .Net

I'm looking for any free tools/components/libraries that allow me to take anvantage of text mining, fact extraction and semantic analysis in my .NET application. The GATE project is what I need but it is written in Java. Is there something like GATE in the .NET world? My challange is to extract certain facts out of website text conten...

POS tagger in SharpNLP

I am using SharpNLP for my POS tagging: EnglishMaximumEntropyPosTagger posTagger = new EnglishMaximumEntropyPosTagger(mModelPath); String tagSentence = posTagger.TagSentence(question); I only have 3 tags. How can I load a set of Penn treebank or some other tagging tree banks to use? Thanks :) ...

Computational Linguistics project idea using Hadoop MapReduce

I need to do a project on Computational Linguistics course. Is there any interesting "linguistic" problem which is data intensive enough to work on using Hadoop map reduce. Solution or algorithm should try and analyse and provide some insight in "lingustic" domain. however it should be applicable to large datasets so that i can use hadoo...

How to make concept representation with the help of bag of words

Hi All, Thanks for stoping to read my question :) this is very sweet place full of GREAT peoples ! I have a question about "creating sentences with words". NO NO it is not about english grammar :) Let me explain, If I have bag of words like "person apple apple person person a eat person will apple eat hungry apple hungry" and it can...

Natural Language Processing Package

I have started working on a project which requires Natural Language Processing. We have do the spell checking as well as mapping sentences to phrases and their synonyms. I first thought of using GATE but i am confused on what to use? I found an interesting post here which got me even more confused. http://lordpimpington.com/codespeaks/...

I have a list of names, some of them are fake, I need to use NLP and Python 3.1 to keep the real names and throw out the fake names.

I have no clue of where to start on this. I've never done any NLP and only programmed in Python 3.1, which I have to use. I'm looking at the site http://www.linkedin.com and I have to gather all of the public profiles and some of them have very fake names, like 'aaaaaa k dudujjek' and I've been told I can use NLP to find the real names, ...

Classification of relationships in words?

Hi, I'm not sure whats the best algorithm to use for the classification of relationships in words. For example in the case of a sentence such as "The yellow sun" there is a relationship between yellow and sun. THe machine learning techniques I have considered so far are Baynesian Statistics, Rough Sets, Fuzzy Logic, Hidden markov model ...

Dependency parsing

Hi I particularly like the transduce feature offered by agfl in their EP4IR http://www.agfl.cs.ru.nl/EP4IR/english.html The download page is here: http://www.agfl.cs.ru.nl/download.html Is there any way i can make use of this in a c# program? Do I need to convert classes to c#? Thanks :) ...

NLP - Word Alignment

I am looking for word alignment tools and algorithms. I am dealing with bilingual English - Hindi text, and currently working on DTW (Dynamic Time Warping) algorithm CLA (Competitive Linking Algorithm) NATools Giza++ Could you please suggest any other algorithm/tool which is language independent and which could achieve Statistical w...

NLP: any easy and good methods to find semantic similarity between words?

I don't know whether stackoverflow covers NLP, so I am gonna give this a shot. I am interested to find the semantic relatedness of two words from a specific domain, i.e. "image quality" and "noise". I am doing some research to determine if reviews of a cameras are positive or negative for a particular attribute of the camera. (like ima...

Simple NLP: How to use ngram to do word similarity?

Dear Everyone, I Hear that google uses up to 7-grams for their semantic-similarity comparison. I am interested in finding words that are similar in context (i.e. cat and dog) and I was wondering how do I compute the similarity of two words on a n-gram model given that n > 2. So basically given a text, like "hello my name is blah blah. I...

How to extract common / significant phrases from a series of text entries

I have a series of text items- raw HTML from a MYSQL database. I want to find the most common phrases in these entries (not the single most common phrase, and ideally, not enforcing word-for-word matching). My example is any review on Yelp.com, that shows 3 snippets from hundreds of reviews of a given restaurant, in the format: "Try ...

Mapping words to numbers with respect to definition

As part of a larger project, I need to read in text and represent each word as a number. For example, if the program reads in "Every good boy deserves fruit", then I would get a table that converts 'every' to '1742', 'good' to '977513', etc. Now, obviously I can just use a hashing algorithm to get these numbers. However, it would be mo...

Defining the context of a word - Python

Hi folks, I think this is an interesting question, at least for me. I have a list of words, let's say: photo, free, search, image, css3, css, tutorials, webdesign, tutorial, google, china, censorship, politics, internet and I have a list of contexts: Programming World news Technology Web Design I need to try and match wo...

Hierarchy of meaning

I am looking for a method to build a hierarchy of words. Background: I am a "amateur" natural language processing enthusiast and right now one of the problems that I am interested in is determining the hierarchy of word semantics from a group of words. For example, if I have the set which contains a "super" representation of others, i...

Open Source Library for Linguistic Inquiry and Word Count (LIWC)

Hi, I am looking for an open source library for Linguistic Inquiry and Word Count (LIWC). Something in java or python will be good, though I am open to use other language. Does anyone know where I can get one ? Cheers, ...

how to programming SYSTEM for reading_comperhension question in English.

hi , i have to do some study for reading_comperhension in English. my work is ok but there is part from natural language - nlp area that i have to used . i want some help about QAsystem thank you ...