nlp

input cnf for sat4j solver

Hi, I a totally new to sat4j solver.. it says some cnf file should be given as input is there any possible way to give the rule as input and get whether it is satisfiable or not? my rule will be of the kind Can ssomeone help me how to solve this using sat4j solver? ...

Detecting syllables in a word containing non-alphabetical characters

I'm implementing readability test and have implemented simple algorithm of detecting sylables. Detecting sequences of vowels I'm counting them in words, for example word "shoud" contains one sequence of vowels which is 'ou'. Before I'm counting them i'm removing suffixes like -les, -e, -ed (for example word "like" contains one syllable b...

Word coloring and syntax analyzing

Hey! I want to colorize the words in a text according to their classification (category/declination etc). I have a fully working dictionary, but the problem is that there is a lot of ambiguity. foedere, for instance, can be forms of either the verb "fornicate" or the noun "treaty". What the general strategies for solving these ambiguit...

need some explanation in Earley algorithm

I would be very glad if someone can make clear for me example mentioned ono wikipedia: http://en.wikipedia.org/wiki/Earley_algorithm consider grammar: P → S # the start rule S → S + M | M M → M * T | T T → number and input: 2 + 3 * 4 Earley algorithm works like this: (state no.) Production (Origin) # Comment ----...

Nltk installation

Hi In want to setup python's nltk library including wordnet in such a way that it can be easily copied from development system to production server, without having requirement for downloading wordnet separately. Any suggestion would be helpful... ...

wordnet relations

how to generate the more general, less general and equivalence relations from wordnet? wordnet similarity in RitaWordnet gives a number like -1.0, 0.222 or 1.0 but how to arrive at the more general, less general relations between words? which tool would be ideal for that? please help me i get java.lang.NullPointerException, after it pr...

Ritawordnet - ignoring compound words

RiWordnet wordnet = new RiWordnet(); System.out.println(wordnet.isIgnoringCompoundWords()); gives me true as output, but i have to find the similarity between compound words too.. how to do that in wordnet? ...

Any interesting OCR/NLP related projects for CS final year project?

I am a final year CS student, and very interested about OCR and NLP stuffs. The problem is I don't know anything about OCR yet and my project duration is only for 5 months. I would like to know OCR & NLP stuff that is viable for my project? Is writing a (simple) OCR engine for a single language too hard for my project? What about addi...

Grammar production class implementation in C#

Grammar by definition contains productions, example of very simple grammar: E -> E + E E -> n I want to implement Grammar class in c#, but I'm not sure how to store productions, for example how to make difference between terminal and non-terminal symbol. i was thinking about: struct Production { String Left; // for example E...

Artificial Intelligence and Chat Filters

Are there any chat filters that works depending on the context? I'm talking about the use of new technologies like Artificial Intelligence and Natural Language Processing to determine for example if a word was rude or not, depending on the context. ...

An NLP project feedback

Hello, I am new to Natural Language Processing and I want to learn more by creating a simple project. NLTK was suggested to be popular in NLP so I will use it in my project. Here is what I would like to do: I want to scan our company's intranet pages; approximately 3K pages I would like to parse and categorize the content of these pa...

How to build a conceptual search engine?

I would like to build an internal search engine (I have a very large collection of thousands of XML files) that is able to map queries to concepts. For example, if I search for "big cats", I would want highly ranked results to return documents with "large cats" as well. But I may also be interested in having it return "huge animals", a...

is there is any stemmer available for indian language

hello, is there is any implementation of stemmers for indian languages like(hindi,telugu) are available .... ...

implementing a dictionary

Hii , I ran across a interview question of implementing a dictionary that can implement the features of auto-completion , auto - correction , spell check etc... I actually wanted to know which data structure is the best for implementing a dictionary and how one approaches the above required features... Any links that guide me on this...

Medical information extraction using Python

Hello there, I am a nurse and I know python but I am not an expert, just used it to process DNA sequences We got hospital records written in human languages and I am supposed to insert these data into a database or csv file but they are more than 5000 lines and this can be so hard. All the data are written in a consistent format let me s...

Word Base/Stem Dictionary

It seems my Google-fu is failing me. Does anyone know of a freely available word base dictionary that just contains bases of words? So, for something like strawberries, it would have strawberry. But does NOT contain abbreviations or misspellings or alternate spellings (like UK versus US)? Anything quickly usable in Java would be good bu...

How to extract words from text as per the context

Hello, I want to extract relevant words from a text statement provided by the user. eg. For a question "How many sides are there in a rectangle?" The words should be 'rectangles' , 'sides', 'many' , 'how'. We've discovered that what exactly I'm aiming to do is a NLP Question answer system. But right now I want to only extract the requi...

Ideas for NLP Project

Hi I've been working on a NLP project, trying to define an intermediate POS tagging system and the wrappers for known POS tagging systems for migrating to mine. My question is : What is the best POS Tagging system you´ve seen. Do not talk about a system because you like it, but because of being extensible and descriptive. For thos...

SQL word root matching

Hi all, I'm wondering whether major SQL engines out there (MS SQL, Oracle, MySQL) have the ability to understand that 2 words are related because they share the same root. We know it's easy to match "networking" when searching for "network" because the latter is a substring of the former. But do SQL engines have functions that can mat...

What is proper Tokenization algorithm? & Error: TypeError: coercing to Unicode: need string or buffer, list found

Hello, I'm doing an Information Retrieval Task. As part of pre-processing I want to doing. Stopword removal Tokenization Stemming (Porter Stemmer) Initially, I skipped tokenization. As a result I got terms like this: broker broker' broker, broker. broker/deal broker/dealer' broker/dealer, broker/dealer. broker/dealer; broker/deale...