+9 A:

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Brian Campbell 2009-04-01 05:04:21

+2 A:

PageRank is a link analysis algorithm used by Google for the search engine, but the patent was assigned to Stanford University.

TStamper 2009-04-01 05:04:47

The patent is was actually registered by Stanford. The dirty little secret of PageRank is that Google doesn't own it -- Stanford does.

Deane 2009-04-01 05:40:36

yeah i just noticed..it states that in the link that it was assigned to standford

TStamper 2009-04-01 05:53:47

A:

Inverted index and MapReduce is the basics of most search engines (I believe). You create an index on the content and run queries against that index to display relevance. Google however does much more than just a simple index of where each word occurs, they also do how many times it appeared, where it appears, where it appears in relation to other words, the ordering, etc. Another simple concept that's used is "stop words" which may include things like "and", "the", and so on (basically "simple" words that occur often and generally not the focus of a query). In addition, they employ things like Page Rank (mentioned by TStamper) to order pages by relevance and importance.

MapReduce is basically taking one job and dividing it into smaller jobs and letting those smaller jobs run on many systems (in parts for scalability and in parts for speed). If I recall correctly, Google was able to make use of "average" computers to distribute jobs to instead of server-grade computers. Since the processing capability of one computer is reaching a peak, many technology are heading towards cloud computing where a job is done by many physical machines.

I'm not sure how much searching Google does, it's more accurately crawling. The difference lies in that they just start at specific points and crawl to anything reachable and repeat until they hit some sort of dead-end.

nevets1219 2009-04-01 05:18:45

+3 A:

Google's patented PigeonRank™

Wow, they initially posted this 7 years ago from Wednesday ...

Bratch 2009-04-01 05:21:21

I believe this was a hoax : http://en.wikipedia.org/wiki/PigeonRank#2002:_Pigeon_Rank

TStamper 2009-04-01 05:25:24

I think the wikipedia article is fake, PigeonRank(tm) is real!

CVertex 2009-04-01 05:27:15

confirmed...it was an april fools joke- http://www.april-fools.us/google-pigeonrank.htm

TStamper 2009-04-01 05:32:53

That link is also fake.

CVertex 2009-04-01 05:35:54

From the orginal link (at the bottom): "Note: This page was posted for April Fool's Day - 2002." I should have placed, "Note: This page was posted for April Fool's Day - 2009" and waited 2 hours (PDT).

Bratch 2009-04-01 05:37:49

lol..ok how about this forum - http://forums.digitalpoint.com/showthread.php?t=1674

TStamper 2009-04-01 05:37:57

That link any other link posted from here on is fake.

CVertex 2009-04-01 05:39:45

It's april 1st where i am

CVertex 2009-04-01 05:40:33

how about Jon S. come confirm this

TStamper 2009-04-01 05:45:00

J.S. confirmed it before it happened.

Bratch 2009-04-01 05:48:42

when..cause i want to know for myself

TStamper 2009-04-01 05:54:33

http://stackoverflow.com/users/209/cvertex <- this link is a fake

Pete Kirkham 2009-04-01 11:36:52

PigeonRank is not a real algorithm name

TStamper 2009-04-01 13:02:02

+5 A:

Indexing

If you want to get down to basics:

Google uses an inverted index of the Internet. What this means is that Google has an index of all pages it's crawled based on the terms in each page. For instance the term Google maps to this page, the Google home page, and the Wikipedia article for Google, amongst others.

Thus, when you go to Google and type "Google" into the search box, Google checks its index of all terms available on the Internet and finds the entry for the term "Google" and with it the list of all pages that have that term referenced in it.

For veteran users:

Google's index goes beyond your simple inverted index, however. This is why Google is the best. Google's crawlers (spiders) are smart. Very smart. Beyond just keeping track of the terms that are on any given web page, they also keep track of words that are on related pages and link those to the given document.

In other words, if a page has the term Google in it and the page has a link to or is linked from another web page, the other page may be referenced in the index under the term Google as well. All this and more go into why a given page is returned for a given query.

If you want to go into why pages are ordered the way they are in your search results, that gets into even more interesting stuff.

Ranking

To get down to basics:

Perhaps one of the most basic algorithms a search engine can use to sort your results is known as term frequency-inverse document frequency (tf-idf). Simply put, this means that your results will be ordered by the relative importance of your search terms in the document. In other words, a document that has 10 pages and lists the word Google once is not nearly as important as a document that has 1 page and lists the word Google ten times.

For veteran users:

Again, Google does quite a bit more than your basic search engine when it comes to ranking results. Google has implemented the aforementioned, patented, PageRank algorithm. In short form, PageRank enhances the tf-idf algorithm by taking into account the populatirty/importance of a given page. At this point, popularity/importance may be judged by any number of factors that Google just wont tell us. However, at the most basic of levels, Google can tell that one page is more important than another because loads and loads of other pages link to it.

dustyburwell 2009-04-01 05:39:02

A:

I think "The Anatomy of a Large-Scale Hypertextual Web Search Engine" is a little outdated. Hier a recent talk about scalability: Challenges in Building Large-Scale Information Retrieval Systems

bill 2009-06-15 17:01:47

A:

While being interested in the page rank algorithm and similar I was disturbed to discover that the introduction of personal search at the turn of the year (not widely commented on) seems to change quite a lot - see Failure of the Google Gold Standard and Google’s Personalized Results

mikej 2010-02-16 09:42:39

ansaurus

tags:

views:

answers:

What searching algorithm/concept is used in Google?

Indexing

If you want to get down to basics:

For veteran users:

Ranking

To get down to basics:

For veteran users:

related questions