I'm writing a text tag parser and I'm currently using this recursive method to create tags of n words. Is there a way that it can be done non-recursively or at least be optimized? Assume that $this->dataArray could be a very large array.
/**
* A recursive function to add phrases to the tagTracker array
* @param string $data
* @param ...
Opencalais lets you submit a string (REST API) ....and it will analyze that string and break it down into named-entities, relationships, keywords, etc.
Are there better tools other than opencalais? (both free and commercial)
...
I'm building a spelling corrector for search engine queries by implementing the method described in "Spelling correction as an iterative process that exploits the collective knowledge of web users".
The high-level approach is as follows: for a given query, come up with possible correction candidates (words in the query log within a c...
Hello,
I was wondering if anyone could help me through a code snippet that demonstrates how to train Naive Bayes classifier using a feature frequency method as opposed to feature presence.
I presume the below as shown in Chap 6 link text refers to creating a featureset using Feature Presence (FP) -
def document_features(document):
...
Hello,
I am slightly confused as to what "feature selection / extractor / weights" mean and the difference between them. As I read the literature sometimes I feel lost as I find the term used quite loosely, my primary concerns are --
When people talk of Feature Frequency, Feature Presence - is it feature selection?
When people talk ...
I have recently expanded the names corpus in nltk and would like to know how I can turn the two files I have (male.txt, female.txt) in to a corpus so I can access them using the existing nltk.corpus methods. Does anyone have any suggestions?
Many thanks,
James.
...
Hi , i have built some plugin components to GATE and in combination with ANNIE tools, im running a pipeline in GATE platform.
Does anyone know how can i run a pipeline from the console? I want to build a web application in Tomcat that will be taking a plain text from the web page, passing it to the GATE pipeline i have built and do so...
Does anybody know of an English verb inflector that I can use on a lexicon of verbs (in present-participle) that can give me other inflected forms of the verbs?
For example:
I give it I get
========= ======================================
run ran, running, runs
sing sang, singing, sings
play played, ...
Hello I would like to know how to implement the solution to such a task:
There's a 500Mb file of plain English texts.
I'd like to collect the statistics about the frequency of words,
but additionally to be sure that each word is recognized correctly (or the majority of words).
In terms that 'cry' in the sentence "she gave a loud CRY" ...
Hi, I am using Python 3.1, but I can downgrade if needed.
I have an ASCII file containing a short story written in one of the languages the alphabet of which can be represented with upper and or lower ASCII. I wish to:
1) Detect an encoding to the best of my abilities, get some sort of confidence metric (would vary depending on the len...
Assume you know a student who wants to study Machine Learning and Natural Language Processing.
What introductory subjects would you recommend?
Example: I'm guessing that knowing Prolog and Matlab might help him. He also might want to study Discrete Structures*, Calculus, and Statistics.
*Graphs and trees. Functions: properties, recur...
What are books about how to build a natural language parsing program like this:
input: I got to TALL you
output: I got to TELL you
input: Big RAT box
output: Big RED box
in: hoo un thum zend three
out: one thousand three
It must have the language model that allows to predict what words are misspelled !
What are the best books on ...
A lot of Natural Language Processing (NLP) algorithms and libraries have a hard time working with random texts from the web, usually because they are presupposing clean, articulate writing. I can understand why that would be easier than parsing YouTube comments.
My question is: given a random piece of text, is there a process to determi...
Hi:)
I have a hindi script file like this:
3. भारत का इतिहास काफी समृद्ध एवं विस्तृत है।
I have to write a program which adds a position to each and every word in each sentence.
Thus the numbering for every line for a particular word position should start off with 1 in parentheses. The output should be something like this.
3. भारत...
Start with this:
[G|C] * [T] *
Write a program that generates this:
Cat
Cut
Cute
City <-- NOTE: this one is wrong, because City has an "ESS" sound at the start.
Caught
...
Gate
Gotti
Gut
...
Kit
Kite
Kate
Kata
Katie
Another Example, This:
[C] * [T] * [N]
Should produce this:
Cotton
Kitten
Where should I start my research as...
I'm looking for a good open source POS Tagger in Java. Here's what I have come up with so far.
LingPipe
Stanford
LBJ
FastTag
Anybody got any recommendations?
...
Hi:) I am not able to figure out what the error in the program is could you please help me out with it. Thank you..:)
The input file contains the following:
3. भारत का इतिहास काफी समृद्ध एवं विस्तृत है।
57. जैसे आज के झारखंड प्रदेश से, उन दिनों, बहुत से लोग चाय बागानों में मजदूरी करने के उद्देश्य से असम आए।
( its basically sample se...
Hi..:) I have a code which appends word positions to the words from the source file
but the output is not coming as desired:
The input file contains the following:
3. भारत का इतिहास काफी समृद्ध एवं विस्तृत है।
57. जैसे आज के झारखंड प्रदेश से, उन दिनों, बहुत से लोग चाय बागानों में मजदूरी करने के उद्देश्य से असम आए।
The original sourc...
I have two subtitles files.
I need a function that tells whether they represent the same text, or the similar text
Sometimes there are comments like "The wind is blowing... the music is playing" in one file only.
But 80% percent of the contents will be the same. The function must return TRUE (files represent the same text).
And sometime...
Suppose we have a text file with the content:
"Je suis un beau homme ..."
another with:
"I am a brave man"
the third with a text in German:
"Guten morgen. Wie geht's ?"
How do we write a function that would tell us: with such a probability the text in the first
file is in English, in the second we have French etc?
Links to books / ou...