googlebot

Bot Web Quality

I am looking for a good open source bot to determine some quality, often required for google indexing. For example find duplicate titles invalid links ( jspider do this, and I think a lot more will do this) exactly the same page, but different urls etc, where etc equals google quality reqs. ...

Why does Googlebot traverse a newly added site in ascending order of URL-length?

Googlebot (Googlebot/2.1) appears to crawl URL:s on a newly added sites in an order corresponding to the length of the URL: .. GET /ivjwiej/ HTTP/1.1" 200 .. "Mozilla/5.0 (compatible; Googlebot/ .. .. GET /voeoovo/ HTTP/1.1" 200 .. "Mozilla/5.0 (compatible; Googlebot/ .. .. GET /zeooviee/ HTTP/1.1" 200 .. "Mozilla/5.0 (compatible; Googl...

Finding out how when google last crawled

I'd like to find out how current google's cached copy of a large set of pages is. I think I need to look in the logs for IP's, check to find user-agent "googlebot", then export a list that says each page and when it was last visited. I imagine this could be a cron job that runs weekly. If this is right, how would I write the s...

How does Google Know you are Cloaking?

I can't seem to find any information on how google determines if you are cloaking your content. How, from a technical standpoint, do you think they are determining this? Are they sending in things other than the googlebot and comparing it to the googlebot results? Do they have a team of human beings comparing? Or can they somehow tel...

ASP.NET MVC GoogleBot Issues

I wrote a site using ASP.NET MVC, and although it is not completely SEO optimized at this point I figured it is a good start. What I'm finding is that when I use Google's Webmaster Tools to fetch my site (to see what a GoogleBot sees) it sees this. HTTP/1.1 200 OK Cache-Control: public, max-age=1148 Content-Type: application/xhtml+xml; ...

Legitimacy of optimizing a site to site to load fast for googlebot

The question I have is a bit of a ethical one. I read here that Google gives a little more influence to sites that are optimized to load quickly. Obviously this makes Google's job easier, using less resources and it is a better experience for everyone, so why not reward it? The actual process of finding bottlenecks and improving page ...

Google Crawling/Indexing Frequency Increasing?

Sometime ago google used to update their index and backlinks every 3-4 months. It used to be a big update. Recently I noticed that the updates are way too frequent. has anyone else noticed these sort of changes in Google crawling, indexing and backlink updates? ...

Is localization using Cookies search engine compatible?

I'm in the process of localizing a website. I was going to go the way of setting a cookie to the preferred language, and then display the respective language. And, if no cookie was set it would use the preferred language header, as set by the user's browser - and if the header was not set then it would default to English. So - how does ...

SEO: does google bot see text in hidden divs

I have login/signup popups on my site which are in hidden div by default. According to http://stackoverflow.com/questions/1547426/google-seo-and-hidden-elements googlebot should NOT see it. But Google Webmaster tool says that keywords "email" and "password" are top keywords over the site. Why it is so? Why google bot sees them? Should...

JS dynamic img change and SEO

Hi all, I've built a web site using jquery to make nice transitions between content. The code works this way: there are 2 imgs (body and footer) when I click on a link (instead of going to another page) I fade out the 2 imgs and change the src attribute of the 2. When the new imgs are loaded I fade them back in. I'm using SWFaddress ...

Will news ticker using overflow:hidden cause Google to see site as spam?

In the hope of tempting Googlebot with fresh content, I've implemented a homepage news ticker which displays the 20 most recent headlines on our site. The implementation I have chosen is a <ul> with each headline being a <li> Initially all the <li> elements have no style but Javascript kicks in on page load and gives all but one of the...

How to return proper 404 for google while providing user friendly content to the user?

I am bouncing between posting this here and on Superuser. Please excuse me if you feel this does not belong here. I am observing the behavior described here - Googlebot is requesting random urls on my site, like aecgeqfx.html or sutwjemebk.html. I am sure that I am not linking these urls from anywhere on my site. I suspect this may be ...

Why is my ColdFusion page returning a blank page to search engines?

I've done plenty of ASP.NET and PHP development, but I'm less familiar with how to track this sort of thing down in CF. My naive first angle of attack was to search for any reference to Google in any of the source code. No luck. I'm running the site on IIS7. Google, Bing and Yahoo all apparently "see" nothing on my site. Update: I ...

Where Googlebot starts crawling?

Say if I register a domain and have developed it into a complete website. From where and how Googlebot knows that the new domain is up? Does it always start with the domain registry? If it starts with the registry, does that mean that anyone can have complete access to the registry's database? Thanks for any insight. ...

Anonymous users support vs Google bot

I have a User class in my web app that represents a user currently logged in. Every time a user vists a page, a User instance is populated based on authentication data supplied in cookies. A User instance is created even if an anonymous user logs in - and a corresponding new record is created in the User table in the database. This ap...

How can I verify a Googlebot

I'm going to block all bots except the big search engines. One of my blocking methods will be to check for "language": Accept-Language: If it has no Accept-Language the bot's IP address will be blocked until 2037. Googlebot does not have Accept-Language, I want to verify it with DNS lookup <?php gethostbyaddr($_SERVER['REMOTE_ADDR']); ?...

Non-indexed file (?) still found in Google

How is it possible that my page /admin/login.asp is found in Google with the query "inurl:admin/login.asp" while it isn't with the "site:www.domain.xx" query? I've this line of code in my robots.txt: User-agent: * Disallow: /admin/ And this in the HTML code of the page: <meta name="robots" content="noindex, nofollow" /> Any ideas?...

Dynamic Content & SEO: Create 2 Separate Pages?

On a website, there are many pages with a component for users to leave comments. To reduce page load time and since few users use the commenting system, the commenting component is loaded via AJAX after the page is loaded. The issue: how can we get Google to index dynamic content that is loaded via AJAX on page load? Many other pages on...

google bot rel="nofollow" how long to stop following

I just added rel="nofollow" to some links. Anyone know how long it takes for google to stop following after "nofollow" is added to a link? I added an hour ago and still see them crawling the "nofollow" links. ...

Can you deploy Watir on Heroku to generate HTML Snapshots? If so, how?

I would like to generate HTML Snapshots using Watir, hosted on Heroku. Google's Full Specification for Making AJAX Applications Crawlable suggests using HTMLUnit... see How do I create an HTML snapshot? point #3. HtmlUnit is a Java-only headless browser emulator; and unfortunately jRuby is not an option on Heroku. So HtmlUn...