natural-language

RDF of sentences

Hi, I need to classify sentences as a RDF format. In other words "John likes coke" would be automatically represented as Subject : John Predicate : Likes Object : Coke does nyone know where I should start? Are there any programs which can do this automatically or would I need to do everything from scratch? Any help would be appreci...

List of uninteresting words

[Caveat] This is not directly a programing question, but it is something that comes up so often in language processing that I'm sure it's of some use to the community. Does anyone have a good list of uninteresting (English) words that have been tested by more then a casual look? This would include all prepositions, conjunctions, etc... ...

Stanford Parser - Traversing the typed dependencies graph

Hello! Basically I want to find a path between two NP tokens in the dependencies graph. However, I can't seem to find a good way to do this in the Stanford Parser. Any help? Thank You Very Much ...

How to estimate the quality of a web page?

Hello, I'm doing a university project, that must gather and combine data on a user provided topic. The problem I've encountered is that Google search results for many terms are polluted with low quality autogenerated pages and if I use them, I can end up with wrong facts. How is it possible to estimate the quality/trustworthiness of a pa...

is it better to use a "natural" language to write code?

I recently saw a programming language called supernova and they said in the web page : The Supernova Programming language is a modern scripting language and the First one presents the concept of programming with direct Fiction Description using Clear subset of pure Human Language. and you can write code like: i w...

entity set expansion python

Do you know of any existing implementation in any language (preferably python) of any entity set expansion algorithms, such that the one from Google sets ? ( http://labs.google.com/sets ) I couldn't find any library implementing such algorithms and I'd like to play with some of those to see how they would perform on some specific task I...

Naive Bayesian for Topic detection using "Bag of Words" approach

I am trying to implement a naive bayseian approach to find the topic of a given document or stream of words. Is there are Naive Bayesian approach that i might be able to look up for this ? Also, i am trying to improve my dictionary as i go along. Initially, i have a bunch of words that map to a topics (hard-coded). Depending on the occ...

For the iPhone, can you program for different languages?

For the iPhone, is it possible to program applications to translate words from a base language to any of several languages of various users. If so, how? ...

latin bases language segmentation gramatical rules

Hi folks, I am working on one feature i.e. to apply language segmentation rules ( grammatical ) for Latin based language ( English currently ). Currently I am in phase of breaking sentences of user input. e.g.: "I am working in language translation". "I have used Google MT API for this" In above example i will break above sentence ...

How to honor/inherit user's language settings in WinForm app

I have worked with globalization settings in the past but not within the .NET environment, which is the topic of this question. What I am seeing is most certainly due to knowledge I have yet to learn so I would appreciate illumination on the following. Setup: My default language setting is English (en-us specifically). I added a secon...

Given a document, select a relevant snippet.

When I ask a question here, the tool tips for the question returned by the auto search given the first little bit of the question, but a decent percentage of them don't give any text that is any more useful for understanding the question than the title. Does anyone have an idea about how to make a filter to trim out useless bits of a que...

Corpus/data set of English words with syllabic stress information?

I know this is a long shot, but does anyone know of a dataset of English words that has stress information by syllable? Something as simple as the following would be fantastic: AARD vark A ble a BOUT ac COUNT AC id ad DIC tion ad VERT ise ment ... Thanks in advance! ...

Is there a list of language only character regions for UTF-8 somewhere?

I'm trying to analyze some UTF-8 encoded documents in a way that recognizes different language characters. For my approach to work I need to ignore non-language characters, such as control characters, mathematical symbols etc. Just trying to dissect the basic Latin section of the UTF standard has resulted in multiple regions, with charac...

Natural language grammar and user-entered names

Some languages, particularly Slavic languages, change the endings of people's names according to the grammatical context. (For those of you who know grammar or studied languages that do this to words, such as German or Russian, and to help with search keywords, I'm talking about noun declension.) This is probably easiest with a set of e...

Algorithm for sentence analysis and tokenization

I need to analyze a document and compile statistics as to how many times each a sequence of words is used (so the analysis is not on single words but of batch of recurring words). I read that compression algorithms do something similar to what I want - creating dictionaries of blocks of text with a piece of information reporting its fre...

Database structure for versioning and multiple languages

How can I solve the issue of content existing in multiple versions and multiple languages? My current structure: Each content can only have one active version in each language, and that's how I'm curious on how to best solve. Right now I have a column of the contentversions table, which means for each change of active version I have ...

A PHP Library / Class to Count Words in Various Languages?

Some time in the near future I will need to implement a cross-language word count, or if that is not possible, a cross-language character count. By word count I mean an accurate count of the words contained within the given text, taking the language of the text. The language of the text is set by a user, and will be assumed to be correc...

Are there any well known algorithms to detect the presence of names?

For example, given a string: "Bob went fishing with his friend Jim Smith." Bob and Jim Smith are both names, but bob and smith are both words. Weren't for them being uppercase, there would be less indication of this outside of our knowledge of the sentence. Are there any well known algorithms for detecting the presence of names, at lea...

SOLR and Natural Language Parsing - Can I use it?

hey guys, my requirements are pretty similar to this: Requirements http://stackoverflow.com/questions/90580/word-frequency-algorithm-for-natural-language-processing Using Solr While the answer for that question is excellent, I was wondering if I could make use of all the time I spent getting to know SOLR for my NLP. I thought of SOL...

How do you think the "Quick Add" feature in Google Calendar works?

Am thinking about a project which might use similar functionality to how "Quick Add" handles parsing natural language into something that can be understood with some level of semantics. I'm interested in understanding this better and wondered what your thoughts were on how this might be implemented. If you're unfamiliar with what "Qui...