googlebot

How to set up a robot.txt which only allows the default page of a site

Say I have a site on http://website.com. I would really like allowing bots to see the home page, but any other page need to blocked as it is pointless to spider. In other words http://website.com & http://website.com/ should be allowed, but http://website.com/anything and http://website.com/someendpoint.ashx should be blocked. Further...

How to make Flex RIA contents accessible to search engines like Google?

How would you make the contents of Flex RIA applications accessible to Google, so that Google can index the content and shows links to the right items in your Flex RIA. Consider a online shop, created in Flex, where the offered items shall be indexed by Google. Then a link on Google should open the corresponding product in the RIA. ...

How do I convince the Googlebot that two formerly aliased sites are now separate?

This will require a little setup. Trust me that this is for a good cause. The Background A friend of mine has run a non-profit public interest website for two years. The site is designed to counteract misinformation about a certain public person. Of course, over the last two years those of us who support what he is doing have relent...

Googlebots Ignoring robots.txt?

I have a site with the following robots.txt in the root: User-agent: * Disabled: / User-agent: Googlebot Disabled: / User-agent: Googlebot-Image Disallow: / And pages within this site are getting scanned by Googlebots all day long. Is there something wrong with my file or with Google? ...

Googlebot not respecting Robots.txt

For some reason when I check on Google Webmaster Tool's "Analyze robots.txt" to see which urls are blocked by our robots.txt file, it's not what I'm expecting. Here is a snippet from the beginning of our file: Sitemap: http://[omitted]/sitemap_index.xml User-agent: Mediapartners-Google Disallow: /scripts User-agent: * Disallow: /scrip...

Is this a blackhat SEO technique?

Hi, I have a site which has been developed completely in flash. Now the site owners do not want to shift to a more text/html based site. So am planning to create an alternative html/text based site which the googlebot will get redirected to. (By checking the useragent). My question is that is this allowed officially by google? If not t...

will googlebot index my site?

Hi, in my robots.txt file, I have the following line User-agent: Googlebot-Mobile Disallow: / User-agent:GoogleBot Disallow: / Sitemap: http://mydomain.com/sitemapindex.xml I know that if I put the first 4 lines , googlebot won't index the sites, but what if I put the last line Sitemap: http://mydomain.com/sitemapindex.xml, will go...

Should I be concerned if googlebot is trying to index marketing URLs?

I have recently started using Google Webmaster Tools. I was quite surprised to see just how many links google is trying to index. http://www.example.com/?c=123 http://www.example.com/?c=82 http://www.example.com/?c=234 http://www.example.com/?c=991 These are all campaigns that exist as links from partner sites. For right now they'...

Detecting well behaved / well known bots

I found this question very interesting : Programmatic Bot Detection I have a very similar question, but I'm not bothered about 'badly behaved bots'. I am tracking (in addition to google analytics) the following per visit : Entry URL Referer UserAgent Adwords (by means of query string) Whether or not the user made a purchase etc. The...

Google Crawler Time Restriction

does anyone know that it is possible to setup any property in order to inform googlebot to just come and crawl the site during specific day or time period (eg. during the weekend only)? thanks, ...

When does Google re-crawl a site?

When does Google re-crawl a site? And why does Google have two versions of the same page in Cache?? http://forum.portal.edu.ro/index.php?showtopic=112733 cache pages are: forum.portal.edu.ro/index.php?showtopic=112733&st=25/ forum.portal.edu.ro/index.php?showtopic=112733&st=50 ...

Why and how does the googlebot use my website's search engine?

Looking through my search logs from time to time, I notice that by far the biggest user of my search engine is the google-bot. What gives? Is it looking for content that might not be directly accessible through navigation? If so, how does it know which words and phrases to look for (they're surprisingly relevant). Does it check the most ...

What should i add to my site to make google index the subpages as well

I am a beginner web developer and i have a site JammuLinks.com, it is built on php. It is a city local listing search engine. Basically i've written search pages which take in a parameter, fetch the records from the database and display it. So it is dynamically generating the content. However if you look at the bottom of the site, i have...

Why would Google (or Googlebot) index a page returning a 500 error?

Googlebot has been occasionally indexing one of our sites with a bad query string parameter. I am not sure how it is getting this query string parameter (there don't appear to be any sites linking to us with bad links, and nothing in our site is inserting the bad value). The bad parameter causes the site to throw a 500 error, as we expec...

How to prevent Googlebot from overwhelming site?

I'm running a site with a lot of content, but little traffic, on a middle-of-the-road dedicated server. Occasionally, Googlebot will stampede us, resulting in Apache maxing out its memory, and causing the server to crash. How can I avoid this? ...

Is there a way to tell when googlebot/bingbot/yahoobot is crawling my site in asp.net 2005 IIS6?

I want to know when google is crawling the site, preferably by sending myself an email. Is there any way to do this that won't adversely effect performance? ...

Possible for Google Bot to Execute PHP Script

Hello, I have a CRON job php script that I just set up not too long ago. However, I noticed that the PHP file executed (without the cron job activating). It appears that it happened when a Google Bot crawled the file, because I noticed that the following engine visited my page: http://www.google.com/bot.html My question is: 1) Is ...

Will Googlebot read microformat data inserted via javascript?

I have already tried Google's microformat testing tool, but it's not clear to me that it works the same way as Googlebot -- it seems reasonable that Googlebot would have more features than a simple web-based testing tool. So, I'm wondering -- does anyone have any real-world experience in successfully getting Googlebot to parse microfor...

How can I prevent the googlebot from crawling Ajaxified Links?

I've got a bunch of ajaxified links that do things like vote up, vote down, flag a post - standard community moderation stuff. Problem is that the googlebot crawls those links, and votes up, votes down, and flags items. Will adding this to robots.txt prevent the googlebot from crawling those links? Or is there something else I need to...

In which programming language is the Googlebot written (or any other efficient web-crawler)?

Does anyone know in which programming language the Googlebot was written? Or, more generally, in which language are efficient web-crawlers written? I've seen many in Java language, but it doesn't seem to me the most appropriate language to develop a web-crawler because it creates far too much overhead (tried with Heritrix web-crawler, ...