robots

What's a good Web Crawler tool

I need to index a whole lot of webpages, what good webcrawler utilities are there? I'm preferably after something that .NET can talk to, but that's not a showstopper. What I really need is something that I can give a site url to & it will follow every link and store the content for indexing. ...

Should I get rid of bots visiting my site?

I have been noticing on my trackers that bots are visiting my site ALOT. Should I change or edit my robots.txt or change something? Not sure if thats good, because they are indexing or what? ...

Googlebots Ignoring robots.txt?

I have a site with the following robots.txt in the root: User-agent: * Disabled: / User-agent: Googlebot Disabled: / User-agent: Googlebot-Image Disallow: / And pages within this site are getting scanned by Googlebots all day long. Is there something wrong with my file or with Google? ...

Robots.txt: allow only major SE

Is there a way to configure the robots.txt so that the site accepts visits ONLY from Google, Yahoo! and MSN spiders? ...

How do I stop search engines indexing a maintenance page

I need to setup a maintenance page for a website I'm running, e.g. for display when I'm performing site maintenance (scheduled downtime) or if something really breaks and I need to put up a holding page. Is there anything special I need to do to ensure that search engine crawlers don't index it and think that it's my site. Or should I d...

Pathfinding Algorithm for Robot

I have a robot that uses an optical mouse as a position track. Basically, as the robot moves it is able to track change in X and Y directions using the mouse. The mouse also tracks which direction you are moving - ie negative X or positive X. These values are summed into separate X and Y registers. Now, the robot rotates in place and m...

Are there any CRobots style games that support robots written in more than one language?

Many years ago, just as I was getting started with programming, I ran into some programming games in the style of CRobots (I don't think it actually was CRobots, but a clone of sorts) which were pretty cool to play around with. Recently I've gotten a feeling of "programming is work, not play", which I would rather get rid of, so I figur...

How to Verify whether a Robot is Entering Information

Hi All, I have a web form which the users fill and the info send to server and stored on a database. I am worried that Robots might just fill in the form and I will end up with a database full of useless records. How can I prevent Robots from filling in my forms? I am thinking maybe something like Stackoverflow's robot detection, where ...

How to track all website activity and filtering web robot data

I'm doing a very rudimentary tracking of page views by logging url, referral codes, sessions, times etc but finding it's getting bombarded with robots (Google, Yahoo etc). I'm wondering what an effective way is to filter out or not log these statistics? I've experimented with robot IP lists etc but this isn't foolproof. Is there some k...

Detect Robot on Server??

Hi, This question sort of extends my other question on robots and captcha. I did what everyone recommend (thanks everyone!), however is it at all possible to detect a robot on the server first? For Example (Once again, I will use Stackoverflow as a reference): Sometimes when I ask a question, Stackoverflow comes back asking me to verify...

Do robots crawl iframes?

Is it common for robots to crawl inside iframes? And how do they do that? Do they index it instantly or do they just 'remember' the URL and continue sometimes later? ...

What is Robot Army Testing?

What is Robot Army Testing? Where is it used? How can I learn it? ...

robots.txt syntax question.

Hello guys. I have a few doubts about this robots file. User-agent: * Disallow: /administrator/ Disallow: /css/ Disallow: /func/ Disallow: /images/ Disallow: /inc/ Disallow: /js/ Disallow: /login/ Disallow: /recover/ Disallow: /Scripts/ Disallow: /store/com-handler/ Disallow: /store/img/ Disallow: /store/theme/ Disallow: /store/StoreSys...

Microsoft Robotics: cheap but very extensible robot?

Is there any cheap and very extensible robot kit, which can work with Microsoft Robotics? I want to have a great choice of cool parts for a robot to buy. :) If where is no such robot kit which can work with MS Robotics, is there any chance to buy a very extensible robot which just can be programmed, maybe even in assembler? ...

PHP Detecting bot-like behavior

I am attempting to build a system that only shows users a CAPTCHA when bot-like behavior is detected. Here are the behaviors that I am currently looking for when somebody is filling out a contact form... how quickly the form is submitted after the page loads (if its 5 seconds or less, its almost humanely impossible to fill out) how man...

Is it legal to write a robot getting receiving data from a website regularly?

Hi all, I'd like to write a little software which 'visits' a specific website regularly (every minute, for example) and gets specific data from there. This data is stored in a database which is used by another software I'm planning to write. Is this legal or not? Do I need the permission from the website owner? It's a complete open we...

How do I create google xml sitemap and robot.txt file?

Hi I'm a bit of a beginner at seo, Could anyone tell me how I create and xml sitemap and robots.txt file for my site. Is there some kind of generator for them? Thanks for your help Regards Judi ...

prevent google from indexing

hi sirs what's the best way to prevent google from showing of a folder in the search engine ?, like e.g www.example.com/support , what should i do if I want the support folder to disappear in google ? the first thing I did was place a 'robots.txt' file and include this code User-agent: * Disallow: /support/etc but the results is a tot...

Allowing Google to bypass CAPTCHA verification - sensible or not?

My web site has a database lookup; filling out a CAPTCHA gives you 5 minutes of lookup time. There is also some custom code to detect any automated scripts. I do this as I don't want someone data mining my site. The problem is that Google does not see the lookup results when it crawls my site. If someone is searching for a string that i...

Procedural modeling of Robots?

Procedural techniques is common for texture synthesis, modeling plants, and modeling terrains. However, I've seen very little work on algorithmic construction of robots, which is a bit surprising given how mechanical these systems are. Anyone have a good resource on the algorithmic construction of robots / robotic humanoids? Thanks! ...