I'm trying to use reserved words in my grammar:
reserved = {
'if' : 'IF',
'then' : 'THEN',
'else' : 'ELSE',
'while' : 'WHILE',
}
tokens = [
'DEPT_CODE',
'COURSE_NUMBER',
'OR_CONJ',
'ID',
] + list(reserved.values())
t_DEPT_CODE = r'[A-Z]{2,}'
t_COURSE_NUMBER = r'[0-9]{4}'
t_OR_CONJ = r'or'
t_ignore = ' \t'
def t_ID(t...
I'm doing some project which is based on ontology.I want to identify semantic of the text that has entered by user.
Is there any possible way to fulfill my task dealing with ontology through jena?
...
I'm using PLY to parse sentences like:
"CS 2310 or equivalent experience"
The desired output:
[[("CS", 2310)], ["equivalent experience"]]
YACC tokenizer symbols:
tokens = [
'DEPT_CODE',
'COURSE_NUMBER',
'OR_CONJ',
'MISC_TEXT',
]
t_DEPT_CODE = r'[A-Z]{2,}'
t_COURSE_NUMBER = r'[0-9]{4}'
t_OR_CONJ = r'or'
t_ign...
I think I'm making a mistake in how I call setResultsName():
from pyparsing import *
DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("Dept Code")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("Course Number")
COURSE_NUMBER.setParseAction(lambda s, l, toks : int(toks[0]))
course = DEPT_CODE + COURSE_NUMBER
course.setResultsName("c...
What is the difference between:
foo = TOKEN1 + TOKEN2
and
foo = Combine(TOKEN1 + TOKEN2)
Thanks.
UPDATE: Based on my experimentation, it seems like Combine() is for terminals, where you're trying to build an expression to match on, whereas plain + is for non-terminals. But I'm not sure.
...
I'm parsing sentences like "CS 2110 or INFO 3300". I would like to output a format like:
[[("CS" 2110)], [("INFO", 3300)]]
To do this, I thought I could use setParseAction(). However, the print statements in statementParse() suggest that only the last tokens are actually passed:
>>> statement.parseString("CS 2110 or INFO 3300")
Match...
I have strings like this:
"MSE 2110, 3030, 4102"
I would like to output:
[("MSE", 2110), ("MSE", 3030), ("MSE", 4102)]
This is my way of going about it, although I haven't quite gotten it yet:
def makeCourseList(str, location, tokens):
print "before: %s" % tokens
for index, course_number in enumerate(tokens[1:]):
...
Ok, so I've asked a bunch of smaller questions about this project, but I still don't have much confidence in the designs I'm coming up with, so I'm going to ask a question on a broader scale.
I am parsing pre-requisite descriptions for a course catalog. The descriptions almost always follow a certain form, which makes me think I can par...
hey guys, my requirements are pretty similar to this:
Requirements
http://stackoverflow.com/questions/90580/word-frequency-algorithm-for-natural-language-processing
Using Solr
While the answer for that question is excellent, I was wondering if I could make use of all the time I spent getting to know SOLR for my NLP.
I thought of SOL...
I've been searching around this morning, trying to find whether anyone has put together a JS lib that does either sentiment analysis (or even full on NLP) but haven't lucked into anything other than the Java lib that seems to be the standard and some ruby bits and bobs. Wondered if anyone has come across anything in JS that does more th...
Does anybody know where I can find documentation on how to write annotation schemas for Callisto? I'm looking to write something a little more complicated than I can generate from a DTD -- that only gives me the ability to tag different kinds of text mentions. I'm looking to create a schema that represents a single type of relationship ...
I am trying to use IDF scores to find interesting phrases in my pretty huge corpus of documents.
I basically need something like Amazon's Statistically Improbable Phrases, i.e. phrases that distinguish a document from all the others
The problem that I am running into is that some (3,4)-grams in my data which have super-high idf actually ...
Bit of a random one, i am wanting to have a play with some NLP stuff and I would like to:
Get all the text that will be displayed to the user in a browser from HTML.
My ideal output would not have any tags in it and would only have fullstops (and any other punctuation used) and new line characters, though i can tolerate a fairly reason...
Many job sites have broken searches that don't let you narrow down jobs by experience level. Even when they do, it's usually wrong. This requires you to wade through hundreds of postings that you can't apply for before finding a relevant one, quite tedious. Since I'd rather focus on writing cover letters etc., I want to write a program t...
Given an arbitrary string, for example ("I'm going to play croquet next Friday" or "Gadzooks, is it 17th June already?"), how would you go about extracting the dates from there?
If this is looking like a good candidate for the too-hard basket, perhaps you could suggest an alternative. I want to be able to parse Twitter messages for date...
Are there any equally great packages like Python's NTLK in Java world ?
...
Dear Everyone,
I am interested to do automatic tagging for bodies of text. I am pretty new to NLP so I would like to hear some methods which you guys are familiar with in this context.
Any recommendations will be appreciated.
...
I would like to create exclamations for a particular sentence using the java API?
e.g. It's surprising == Isn't it surprising!
e.g. It's cold == Isn't it cold!
Are there any vendors or tools which help you generate exclamations, provided you give a sentence (i.e. the left hand side in the above example). Note: The sentences will be p...
Hi,
I am looking for Natural language query processing libraries to convert plain english query to sql like statements. For ex, show the list of employees whose age is 30 should be converted to select * from employees where age = 30.
Can you provide pointers/references?
Thanks,
Mani
...
I've got about 300k documents stored in a Postgres database that are tagged with topic categories (there are about 150 categories in total). I have another 150k documents that don't yet have categories. I'm trying to find the best way to programmaticly categorize them.
I've been exploring NLTK and its Naive Bayes Classifier. Seems li...