views:

75

answers:

2

First, it seems this question is not program related, however I still make up my mind to post this question here since there is opaque definition on it. If you think it is wrong feel free to delete it.

I am curious about how Google and other search engines determine keywords? Do they just find keywords like we do by using CTRL+F and counting, or they use some obscure semantics technologies to single out keywords?

The reason I ask this question is when I check Cached pages in search engines, they usually present keywords in different colors. Like this one:

http://74.125.153.132/search?q=cache%3AYKq3QHbl0RwJ%3Awww.autotrader.com/+car&cd=1&hl=en&ct=clnk&client=firefox-a

But it seems they do not count keywords car in the word like carpad. While, as you know, CTRL+F cont car from carpad.

So my question is if I put company name like carpad, or cardoctor in my site, is it helpful to keyword of car.

Disclaimer: Car, carpad and the URL I provided are just examples. And I hope I made myself clear.
Thanks in advance!

+3  A: 

One of the basic techniques they use is the text of the keywords that's placed in the links to that site. For example, when you link to an article about Obama's party crashers - note that the link text was "Obama party crashers". Google can determine that the destination site is about that topic.

Next, it's using recursive inferences. If I know that sites A and B are about topic X, and they both link to site C, I can assume that site C is also about topic X.

Next, it's actual textual mining of the content of the site. Techniques such as TF/IDF are used to determine most relevant keywords from a given page's content.

Alex
A: 

There are several techniques search engines adopt to see if a page is about "cars":

  1. "Cars" are mentioned directly in the page

  2. External links have "cars" in the anchor text

  3. Either has keywords semantically close to "cars" like "vehicles"

  4. They also look at distinctive characteristics of pages and distribution of unique keywords. For example, if search engines know that many pages about "cars" have also "insurance" and "tires" in them, then they can guess that pages where "insurance" and "tires" are present should likely be about "cars" as well even if this keyword is not directly present. Somewhat like a pattern recognition, you see that most characteristics match, you make a guess that the whole should match with a high probability.

And various other techniques...

Developer Art