nlp

Python/YACC Lexer: Token priority?

I'm trying to use reserved words in my grammar: reserved = { 'if' : 'IF', 'then' : 'THEN', 'else' : 'ELSE', 'while' : 'WHILE', } tokens = [ 'DEPT_CODE', 'COURSE_NUMBER', 'OR_CONJ', 'ID', ] + list(reserved.values()) t_DEPT_CODE = r'[A-Z]{2,}' t_COURSE_NUMBER = r'[0-9]{4}' t_OR_CONJ = r'or' t_ignore = ' \t' def t_ID(t...

ontology with java(jena)

I'm doing some project which is based on ontology.I want to identify semantic of the text that has entered by user. Is there any possible way to fulfill my task dealing with ontology through jena? ...

Python: Trouble with YACC

I'm using PLY to parse sentences like: "CS 2310 or equivalent experience" The desired output: [[("CS", 2310)], ["equivalent experience"]] YACC tokenizer symbols: tokens = [ 'DEPT_CODE', 'COURSE_NUMBER', 'OR_CONJ', 'MISC_TEXT', ] t_DEPT_CODE = r'[A-Z]{2,}' t_COURSE_NUMBER = r'[0-9]{4}' t_OR_CONJ = r'or' t_ign...

Python/PyParsing: Difficulty with setResultsName

I think I'm making a mistake in how I call setResultsName(): from pyparsing import * DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("Dept Code") COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("Course Number") COURSE_NUMBER.setParseAction(lambda s, l, toks : int(toks[0])) course = DEPT_CODE + COURSE_NUMBER course.setResultsName("c...

PyParsing: What does Combine() do?

What is the difference between: foo = TOKEN1 + TOKEN2 and foo = Combine(TOKEN1 + TOKEN2) Thanks. UPDATE: Based on my experimentation, it seems like Combine() is for terminals, where you're trying to build an expression to match on, whereas plain + is for non-terminals. But I'm not sure. ...

PyParsing: Not all tokens passed to setParseAction()

I'm parsing sentences like "CS 2110 or INFO 3300". I would like to output a format like: [[("CS" 2110)], [("INFO", 3300)]] To do this, I thought I could use setParseAction(). However, the print statements in statementParse() suggest that only the last tokens are actually passed: >>> statement.parseString("CS 2110 or INFO 3300") Match...

PyParsing: Is this correct use of setParseAction()?

I have strings like this: "MSE 2110, 3030, 4102" I would like to output: [("MSE", 2110), ("MSE", 3030), ("MSE", 4102)] This is my way of going about it, although I haven't quite gotten it yet: def makeCourseList(str, location, tokens): print "before: %s" % tokens for index, course_number in enumerate(tokens[1:]): ...

Python: How best to parse a simple grammar?

Ok, so I've asked a bunch of smaller questions about this project, but I still don't have much confidence in the designs I'm coming up with, so I'm going to ask a question on a broader scale. I am parsing pre-requisite descriptions for a course catalog. The descriptions almost always follow a certain form, which makes me think I can par...

SOLR and Natural Language Parsing - Can I use it?

hey guys, my requirements are pretty similar to this: Requirements http://stackoverflow.com/questions/90580/word-frequency-algorithm-for-natural-language-processing Using Solr While the answer for that question is excellent, I was wondering if I could make use of all the time I spent getting to know SOLR for my NLP. I thought of SOL...

Is there a JavaScript lib/toolkit that does sentiment analysis or NLP?

I've been searching around this morning, trying to find whether anyone has put together a JS lib that does either sentiment analysis (or even full on NLP) but haven't lucked into anything other than the Java lib that seems to be the standard and some ruby bits and bobs. Wondered if anyone has come across anything in JS that does more th...

Writing annotataion schemas for Callisto

Does anybody know where I can find documentation on how to write annotation schemas for Callisto? I'm looking to write something a little more complicated than I can generate from a DTD -- that only gives me the ability to tag different kinds of text mentions. I'm looking to create a schema that represents a single type of relationship ...

Ngram IDF smoothing

I am trying to use IDF scores to find interesting phrases in my pretty huge corpus of documents. I basically need something like Amazon's Statistically Improbable Phrases, i.e. phrases that distinguish a document from all the others The problem that I am running into is that some (3,4)-grams in my data which have super-high idf actually ...

getting text that will be displayed to user from html

Bit of a random one, i am wanting to have a play with some NLP stuff and I would like to: Get all the text that will be displayed to the user in a browser from HTML. My ideal output would not have any tags in it and would only have fullstops (and any other punctuation used) and new line characters, though i can tolerate a fairly reason...

Online job-searching is tedious. Help me automate it.

Many job sites have broken searches that don't let you narrow down jobs by experience level. Even when they do, it's usually wrong. This requires you to wade through hundreds of postings that you can't apply for before finding a relevant one, quite tedious. Since I'd rather focus on writing cover letters etc., I want to write a program t...

Parsing a string for dates in PHP

Given an arbitrary string, for example ("I'm going to play croquet next Friday" or "Gadzooks, is it 17th June already?"), how would you go about extracting the dates from there? If this is looking like a good candidate for the too-hard basket, perhaps you could suggest an alternative. I want to be able to parse Twitter messages for date...

Natural Language Processing Solution in Java ?

Are there any equally great packages like Python's NTLK in Java world ? ...

Can we brainstorm for an automated tagging system?

Dear Everyone, I am interested to do automatic tagging for bodies of text. I am pretty new to NLP so I would like to hear some methods which you guys are familiar with in this context. Any recommendations will be appreciated. ...

how to create exclamations for a particular sentence

I would like to create exclamations for a particular sentence using the java API? e.g. It's surprising == Isn't it surprising! e.g. It's cold == Isn't it cold! Are there any vendors or tools which help you generate exclamations, provided you give a sentence (i.e. the left hand side in the above example). Note: The sentences will be p...

Natural language query processing libraries

Hi, I am looking for Natural language query processing libraries to convert plain english query to sql like statements. For ex, show the list of employees whose age is 30 should be converted to select * from employees where age = 30. Can you provide pointers/references? Thanks, Mani ...

Classifying Documents into Categories

I've got about 300k documents stored in a Postgres database that are tagged with topic categories (there are about 150 categories in total). I have another 150k documents that don't yet have categories. I'm trying to find the best way to programmaticly categorize them. I've been exploring NLTK and its Naive Bayes Classifier. Seems li...