Hello,
I work with contexts for a Mobile Wiki Software. Contexts are used to serve the right information for a specific situation out of large pool of information units.
For example: When you are at the
customer, the system checks your
location and presents you location
based information.
Another example: You are at the
customer and t...
For finding trending topics, I use the Standard score in combination with a moving average:
z-score = ([current trend] - [average historic trends]) / [standard deviation of historic trends]
(Thank you very much, Nixuz)
Until now, I do it as follows:
Whatever the time is, for the historic trends I simply go back 24h. Assuming we have...
I'd like to retrieve data from a specific webpage by using urllib library.
The problem is that in order to open this page some data should be sent to
the server before. If I do it with IE, i need to update first some checkboxes and
then press "display data" button, which opens the desired page.
Looking into the source code, I see that ...
I know web based scripts can be used to identify the charactertics of vistors (display resolution, Java version, OS, architecture, render engine, etc)
But is there anything that could give me amount of system memory resident on the vistors PC?
...
Hi,
Pretty common situation, I'd wager. You have a blog or news site and you have plenty of articles or blags or whatever you call them, and you want to, at the bottom of each, suggest others that seem to be related.
Let's assume very little metadata about each item. That is, no tags, categories. Treat as one big blob of text, includi...
Hello
I have two questions
1- what is the "11pt average precision metric"?
2- and how to use it in information retrieval?
Thanks
...
I have the following requirement: -
I have many (say 1 million) values (names).
The user will type a search string.
I don't expect the user to spell the names correctly.
So, I want to make kind of Google "Did you mean". This will list all the possible values from my datastore. There is a similar but not same question here. This did no...
Here's the problem -- I have a few thousand small text snippets, anywhere from a few words to a few sentences - the largest snippet is about 2k on disk. I want to be able to compare each to each, and calculate a relatedness factor so that I can show users related information.
What are some good ways to do this? Are there known algorit...
My aim is to build an aggregrator of news feeds and blog feeds so as to make
searching/tracking of entitites in it easy. I have been looking at many solutions out there like Terrier, Lucene, SWISH-E, etc.
Basically, I could find only 2 sources of comparison studies done on these engines and one of them is kinda outdated. Basically I w...
We extract various information from e-mails - flights, car rentals, hotels and more. the method is to extract the body of the mail, usually in HTML form but sometime it's text or we use the information in a PDF/Word/RTF attachment. We then apply regular expressions (sometimes in several steps) in order to get information, which is provid...
I want to create a big inverted index of around 106 terms. What method would you suggest? I'm thinking in fast binary key store DBs like Tokyo cabinet, voldemort, etc. Edit: I've tried MySQL in the past for storing a table of two integers to represent the inverted index, but even with the first column having a db index, queries were very...
I would like to easily implement a data classification project, so I'm looking for the language which provides the library for that. Could you suggest the proper language?
...
I have successfully uploaded files into my SQL Server database. I can bring back the information into a GridView. I am unable to figure out how to create a hyperlink to actually open the file.
...
Is Latent Semantic Indexing (LSI) a Statistical Classification algorithm? Why or why not?
Basically, I'm trying to figure out why the Wikipedia page for Statistical Classification does not mention LSI. I'm just getting into this stuff and I'm trying to see how all the different approaches for classifying something relate to one another...
For use to analyze documents on the Internet!
...
I have a problem with Lucene's scoring function that I can't figure out. So far, I've been able to write this code to reproduce it.
package lucenebug;
import java.util.Arrays;
import java.util.List;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
im...
I need a program to get all the web pages under a website. The website is Chinese, I want to get all those English words out. Then I can extract all the information I need. Any ideas for this? Is there any software for this purpose?
If NO, I would like to write one. Any suggestions?
Thanks much.
...
http://en.wikipedia.org/wiki/Cosine%5Fsimilarity
Can you show the vectors here (in a list or something)
And then do the math, and let us see how it works?
I'm a beginner.
...
My teammates and I have a very challenging new project to do, and we are supposed to submit it next week. We don't have a single clue about how to do it, and really need help. We are undergraduate students, new to Information Retrieval and AI, and really need your ideas.
The project is roughly:
When an expert is cited in a document,...
Hello everyone,
I'm a software developer interested in information retrieval. Currently I'm working on my 3rd search engine project and am VERY frustrated about the amount of boilerplate code that is written again and again, with the same bugs, etc.
Basic search engine is a very simple beast that could be described in a formal language ...