Hello,
I was wondering if anyone could help me through a code snippet that demonstrates how to train Naive Bayes classifier using a feature frequency method as opposed to feature presence.
I presume the below as shown in Chap 6 link text refers to creating a featureset using Feature Presence (FP) -
def document_features(document):
...
Hello,
I am slightly confused as to what "feature selection / extractor / weights" mean and the difference between them. As I read the literature sometimes I feel lost as I find the term used quite loosely, my primary concerns are --
When people talk of Feature Frequency, Feature Presence - is it feature selection?
When people talk ...
I have recently expanded the names corpus in nltk and would like to know how I can turn the two files I have (male.txt, female.txt) in to a corpus so I can access them using the existing nltk.corpus methods. Does anyone have any suggestions?
Many thanks,
James.
...
I had a working installation of NLTK (py26-nltk) on my Mac (OS X 10.6.2). Then I installed numpy. Now when I try to import nltk, I get this:
>>> import nltk
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "nltk/__init__.py", line 83, in <module>
from collocations import *
File "nltk/collocations.py"...
I'm trying to run a Python script using exec() from within PHP. My command works fine when I run it directly using a cmd window, but it produces an error when I run it from exec() in PHP.
My Python script uses NTLK to find proper nouns. Example command:
"C:\Python25\python.exe" "C:\wamp\projects\python\trunk\tests\find_proper_nouns.py"...
I'd like to use the nltk toolkit on my machine which runs Ubuntu 9.04. I installed python 2.6.4 and several additional packages (numpy, scipy, matplotlib and of course nltk). I can import nltk, but calling a few methods gives various error masseges, all contain "please install Tkinter library".
Googling around I discovered from http://wi...
I have a series of text items- raw HTML from a MYSQL database. I want to find the most common phrases in these entries (not the single most common phrase, and ideally, not enforcing word-for-word matching).
My example is any review on Yelp.com, that shows 3 snippets from hundreds of reviews of a given restaurant, in the format:
"Try ...
I have a web application that translates sentences into English; the user chooses options from drop downs that basically provide the context. Now I want to turn the word and the context into an English sentence.
One case is that the user chooses 'who' and 'when', 'who' could be: I, you, you two, he, she, we, they. 'When' could be: 'did ...
Hi,
which part of huge package nltk I must study and use, if I need mark geonames in text?
...
Hi guys,
I am a beginner in NLP and NLTK. I am very interested in NLP and hence
joined a weekend course on AI in some local institution, which requires me
to do a project for completion of the course, and I decided to do it in NLP. The problem is,the instructor is not good at all for this course (According to me she
is just a charlatan)...
Hello,
I am curious if there is an algorithm/method exists to generate keywords/tags from a given text, by using some weight calculations, occurrence ratio or other tools.
Additionally, I will be grateful if you point any Python based solution / library for this.
Thanks
...
Hello,
I have already asked a similar question earlier but I have notcied that I have big constrain: I am working on small text sets suchs as user Tweets to generate tags(keywords).
And it seems like the accepted suggestion ( point-wise mutual information algorithm) is meant to work on bigger documents.
With this constrain(working on ...
Hi,
I'm doing a project for a college class I'm taking.
I'm using PHP to build a simple web app that classify tweets as "positive" (or happy) and "negative" (or sad) based on a set of dictionaries. The algorithm I'm thinking of right now is Naive Bayes classifier or decision tree.
However, I can't find any PHP library that helps me do...
Hello all,
I am trying to import NLTK in my python code and I get this error:
Traceback (most recent call last):
File "/home/afs/NetBeansProjects/NER/getNE_followers.py", line 7, in
import nltk
ImportError: No module named nltk
I am using NetBeans: 6.7.1, Python 2.6 NLTK.
My NLTK module is installed in /usr/local/lib/python2.6/d...
I am embarking upon a NLP project for sentiment analysis.
I have successfully installed NLTK for python (seems like a great piece of software for this). However,I am having trouble understanding how it can be used to accomplish my task.
Here is my task:
I start with one long piece of data (lets say several hundred tweets on the subje...
I have a large dataset (c. 40G) that I want to use for some NLP (largely embarrassingly parallel) over a couple of computers in the lab, to which i do not have root access, and only 1G of user space.
I experimented with hadoop, but of course this was dead in the water-- the data is stored on an external usb hard drive, and i cant load it...
I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained:
Maybe if you tell me a little more about who Mr Jones is, that would help. It would also be useful if I could have a description of his appearance, ...
I'm doing a project on mining blog contents and I need help differentiating on which tool to uses. When do I use a parser, when do I use a tagger, and when do I need to use a NER tool?
For instance, I want to find out the most talked about topics/subjects between several blogs; do I use a part-of-speech tagger to grab the nouns and do a...
I've got about 300k documents stored in a Postgres database that are tagged with topic categories (there are about 150 categories in total). I have another 150k documents that don't yet have categories. I'm trying to find the best way to programmaticly categorize them.
I've been exploring NLTK and its Naive Bayes Classifier. Seems li...
I'm trying to use TF-IDF to sort documents into categories. I've calculated the tf_idf for some documents, but now when I try to calculate the Cosine Similarity between two of these documents I get a traceback saying:
#len(u)==201, len(v)==246
cosine_distance(u, v)
ValueError: objects are not aligned
#this works though:
cosine_distan...