views:

241

answers:

4

Here's what I have on my list so far. I'd like to know of others in the same vein, perhaps more technical, perhaps less

Blown to Bits: Your Life, Liberty, and Happiness After the Digital Explosion - Ableson, Leeden, and Lewis
Glut: Mastering Information Through the Ages - Wright
Information Rules - Varian and Shapiro
Web Dragons: Inside the Myths of Search Engine Technology - Witten, Gori, and Numerico

There are a few I've seen on text mining, they include
Web Data Mining - Liu
Modern Information Retrieval - Baeza-Yates, Ribiero-Neto

Also looking for blog recs like
http://www.stat.columbia.edu/~gelman/blog/
http://www.econlib.org/library/Enc/Information.html

or papers like
The Discovery of Structural Form

+2  A: 

Managing Gigabytes - Witten, Moffat, and Bell: a quite detailed look at some of the technologies behind information retrieval, text and image compression. (Disclaimer: my university supervisor is the second author.)

You should also know about ACM's SIGIR, which organises an annual conference on information retrieval, and has a mailing list as well.

TimB
thanks. i was going to put that book on my original list. I appreciate the links.
+2  A: 

Introduction to Information Retrieval seems to be the recommended text these days for the underlying technology; it was released in 2008 and I haven't read it yet. (The full text is free online.) Managing Gigabytes, as TimB recommended, is my favorite older book; it's much better written than Modern Information Retrieval, though that's also worth a look. There's more you can find with the obvious search.

Darius Bacon
A: 

As Book Introduction to Information Retrieval as mentioned.

I think, the best advanced information are the publications found in several academic sites and the conference papers (SIGIR, CIKM, SPIRE, WWW009, ...).

Ex.:

bill
A: 

Hi,

  • "SIGIR" - the conference
  • "TREC" - the conference
  • Baeza-Yates, Ribeiro-Neto, "Modern Information Retrieval" (1999)
  • Witten, "Managing Gigabytes" (1999)
  • van Rijsbergen, "Information Retrieval" (1979)

are the obvious "bibles" (as mentioned above).

  • Büttcher, Clarke, Cormack, "Information Retrieval: Implementing and Evaluating Search Engines" (2010)

is an interesting new textbook (student-level), full of biliographic references. It contains a good explanation of parallel retrieval algorithms (sample chapter).

  • Croft, Metzler, Strohman, "Search Engines: Information Retrieval in Practice" (2009)

has good reviews; I didn't like it too much (read the sample chapters on Croft's homepage).

  • Voorhees, Harman, "Trec: Experiment and Evaluation in Information Retrieval" (2009)

is a good introduction to the TREC approach in evaluating IR.

  • Langville, Meyer, "Google's Pagerank and Beyond: The Science of Search Engine Rankings" (2006)

explains how to efficently compute PageRank.

fubra