I have comments enabled on my site and I require users to enter at least 30 characters to publish their comments (Just to get some value because they usualy just submitted "I like it")
But some users now use simple technique to overcome this and enter e.g.:
"I like it. asdsdf dfdsfsdf tt erretrt re"
As you can see the rest of the text...
I am looking for research (published) on AI techniques for reading cookbook recipes. Recipes are a very limited domain that might be doable in a natural language recognition engine with some degree of accuracy.
I have in mind writing a program that would allow copy/pasting a recipe from a web browser into the AI and having it determine ...
Hi
We have a requirement in which we need to change change the words or phrases in the sentence while keeping its meaning intact. This application is going to provide suggestions to users who are involved in copy-writing.
I don't know where should I start... we have not yet finalized the technology but would like to do it in a Python o...
Hey guys,
I'm trying to use topic modeling with Mallet but have a question.
How do I know when do I need to rebuild the model? For instance I have this amount of documents I crawled from the web, using topic modeling provided by Mallet I might be able to create the models and infer documents with it. But overtime, with new data that I...
Stackoverflow implemented its "Related Questions" feature by taking the title of the current question being asked and removing from it the 10,000 most common English words according to Google. The remaining words are then submitted as a fulltext search to find related questions.
How do I get such a list of the most common English words?...
I am trying to build question based on information available on about 10 variables- e.g. shape (square, circle, rectangle, paralellogram),length, width, circumference, area, diagonal length etc
e.g. if i want to set question to calculate area based on shape, length and width- the question gets created stating- calculate area of 'rectang...
how to generate ngram of a string
like
String Input="This is my car."
i want to generate Ngram of this input
Input Ngram size = 3
Output should come:
This
is
my
car
This is
is my
my car
This is my
is my car
give some idea in java, how to implement that or any library is available for it.
I am trying to use this NGramTokenizer ...
Possible Duplicate:
PHP - How to split a paragraph into sentences.
I have a block of text that I would like to separate into sentences, what would be the best way of doing this? I thought of looking for '.','!','?' characters, but I realized there were some problems with this, such as when people use acronyms, or end a sentenc...
What would be the best definition of an English word?
What are the other cases of an English word than just \w+?
Some may include \w+-\w+ or \w+'\w+; some may exclude cases like \b[0-9]+\b. But I haven't seen
any general consensus on those cases.
Do we have a formal defintion of such?
Can any of you clarify?
(Edit: broaden the questi...
Hi,
I want to use wikipedia dump for my project. The below information is required for my project.
For an wikipedia entry, I want to know which other language contain the page?
I want an downloadable data in csv or other common format.
Is there a way to get this data?
Thanks
Bala
...
Hi,
Is there a partition of english words into a high level categories like say sports, basketball etc... Its required for my project.
Is this data available somewhere? I am okay with overlapping of words across categories.
Thank you
Bala
...
Hi,
I want to get a list of all the wikipedia categories. I can find them here : http://en.wikipedia.org/wiki/Special:Categories Is there a way to download all of them in xml/csv format.
Thank you
Bala
...
Hi,
One simple question (but I haven't quite found an obvious answer in the NLP stuff I've been reading, which I'm very new to):
I want to classify emails with a probability along certain dimensions of mood. Is there an NLP package out there specifically dealing with this? Is there an obvious starting point in the literature I start re...
I have thousands of sentences in a file. I want to find only right/useful English Language words. Is it possible with Natural Language Processing?
Sample Sentence:
~@^.^@~ tic but sometimes world good famous tac Zorooooooooooo
I just want to extract only English Words like
tic world good famous
Any Advice how can I achieve this. Th...
As part of a contact management system I have a large database of names. People frequently edit this and as a result we run into issues of the same person existing in different forms (John Smith and Jonathan Smith). I looked into word similarity but it's easy to think of name variations which are not similar at all (Richard vs Dick). I w...
Basically I need some text like:
I have an ice cream cone.
You are in trouble.
You need a bath.
And change it from 1st or 2nd person to 3rd person.
He has an ice cream cone.
He is in trouble.
He needs a bath.
I've started a js app, but it's super simple at the moment.
Before I waste time reinventing the wheel, I figured I'd ask:...
Hi the aim is to parse a sizeable corpus like wikipedia to generate the most probable parse tree,and named entity recognition. Which is the best library to achieve this in terms of performance and accuracy? Has anyone used more than one of the above libraries?
...
Hi,
I am currently doing a project on person name disambiguation. The idea behind the project, that it will be able to identify the correct person, when there are multiple people with the same name. I have used wikipedia for this. I want to evaluate my project on some standard data. I am looking for some testing data. I am not familiar ...
Hi
I would like to build a very simple application - Automated FAQ. I searched the internet and found some information about different approaches but there is no .Net specific example. Do you have som experience of building such application or maybe know some .Net specific examples? It would be very interesting to take a look at one.
H...
Background
Looking to automate creating Domains in JasperServer. Domains are a "view" of data for creating ad hoc reports. The names of the columns must be presented to the user in a human readable fashion.
Problem
There are over 2,000 possible pieces of data from which the organization could theoretically want to include on a report....