information-retrieval

Mobile Contexts for Information Retrieval

Hello, I work with contexts for a Mobile Wiki Software. Contexts are used to serve the right information for a specific situation out of large pool of information units. For example: When you are at the customer, the system checks your location and presents you location based information. Another example: You are at the customer and t...

Time frames for Standard score

For finding trending topics, I use the Standard score in combination with a moving average: z-score = ([current trend] - [average historic trends]) / [standard deviation of historic trends] (Thank you very much, Nixuz) Until now, I do it as follows: Whatever the time is, for the historic trends I simply go back 24h. Assuming we have...

How to analyze IE activity when opening a specific web page

I'd like to retrieve data from a specific webpage by using urllib library. The problem is that in order to open this page some data should be sent to the server before. If I do it with IE, i need to update first some checkboxes and then press "display data" button, which opens the desired page. Looking into the source code, I see that ...

Web based script to determine system information

I know web based scripts can be used to identify the charactertics of vistors (display resolution, Java version, OS, architecture, render engine, etc) But is there anything that could give me amount of system memory resident on the vistors PC? ...

What tried and true algorithms for suggesting related articles are out there?

Hi, Pretty common situation, I'd wager. You have a blog or news site and you have plenty of articles or blags or whatever you call them, and you want to, at the bottom of each, suggest others that seem to be related. Let's assume very little metadata about each item. That is, no tags, categories. Treat as one big blob of text, includi...

What is the 11pt average precision metric?

Hello I have two questions 1- what is the "11pt average precision metric"? 2- and how to use it in information retrieval? Thanks ...

How to correct the user input (Kind of google "did you mean?")

I have the following requirement: - I have many (say 1 million) values (names). The user will type a search string. I don't expect the user to spell the names correctly. So, I want to make kind of Google "Did you mean". This will list all the possible values from my datastore. There is a similar but not same question here. This did no...

What are some good methods to find the "relatedness" of two bodies of text?

Here's the problem -- I have a few thousand small text snippets, anywhere from a few words to a few sentences - the largest snippet is about 2k on disk. I want to be able to compare each to each, and calculate a relatedness factor so that I can show users related information. What are some good ways to do this? Are there known algorit...

Which open-source search engine should be used ?

My aim is to build an aggregrator of news feeds and blog feeds so as to make searching/tracking of entitites in it easy. I have been looking at many solutions out there like Terrier, Lucene, SWISH-E, etc. Basically, I could find only 2 sources of comparison studies done on these engines and one of them is kinda outdated. Basically I w...

Looking for an information retrival / text mining application or library

We extract various information from e-mails - flights, car rentals, hotels and more. the method is to extract the body of the mail, usually in HTML form but sometime it's text or we use the information in a PDF/Word/RTF attachment. We then apply regular expressions (sometimes in several steps) in order to get information, which is provid...

Ways to create a huge inverted index

I want to create a big inverted index of around 106 terms. What method would you suggest? I'm thinking in fast binary key store DBs like Tokyo cabinet, voldemort, etc. Edit: I've tried MySQL in the past for storing a table of two integers to represent the inverted index, but even with the first column having a db index, queries were very...

What is a programming language which is appropriate with data classification project

I would like to easily implement a data classification project, so I'm looking for the language which provides the library for that. Could you suggest the proper language? ...

How do I retrieve a file from a SQL Server database?

I have successfully uploaded files into my SQL Server database. I can bring back the information into a GridView. I am unable to figure out how to create a hyperlink to actually open the file. ...

Is Latent Semantic Indexing (LSI) a Statistical Classification algorithm?

Is Latent Semantic Indexing (LSI) a Statistical Classification algorithm? Why or why not? Basically, I'm trying to figure out why the Wikipedia page for Statistical Classification does not mention LSI. I'm just getting into this stuff and I'm trying to see how all the different approaches for classifying something relate to one another...

Besides NLTK, what is the best information retrieval library for Python?

For use to analyze documents on the Internet! ...

Problem with Lucene scoring

I have a problem with Lucene's scoring function that I can't figure out. So far, I've been able to write this code to reproduce it. package lucenebug; import java.util.Arrays; import java.util.List; import org.apache.lucene.analysis.SimpleAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; im...

get every page under a website

I need a program to get all the web pages under a website. The website is Chinese, I want to get all those English words out. Then I can extract all the information I need. Any ideas for this? Is there any software for this purpose? If NO, I would like to write one. Any suggestions? Thanks much. ...

Can someone give an example of cosine similarity, in very simple, graphical way?

http://en.wikipedia.org/wiki/Cosine%5Fsimilarity Can you show the vectors here (in a list or something) And then do the math, and let us see how it works? I'm a beginner. ...

a question about a design

My teammates and I have a very challenging new project to do, and we are supposed to submit it next week. We don't have a single clue about how to do it, and really need help. We are undergraduate students, new to Information Retrieval and AI, and really need your ideas. The project is roughly: When an expert is cited in a document,...

How to learn about formal top-down approach to software architecture?

Hello everyone, I'm a software developer interested in information retrieval. Currently I'm working on my 3rd search engine project and am VERY frustrated about the amount of boilerplate code that is written again and again, with the same bugs, etc. Basic search engine is a very simple beast that could be described in a formal language ...