I need to index a whole lot of webpages, what good webcrawler utilities are there? I'm preferably after something that .NET can talk to, but that's not a showstopper.
What I really need is something that I can give a site url to & it will follow every link and store the content for indexing.
...
I have been noticing on my trackers that bots are visiting my site ALOT. Should I change or edit my robots.txt or change something? Not sure if thats good, because they are indexing or what?
...
I have a site with the following robots.txt in the root:
User-agent: *
Disabled: /
User-agent: Googlebot
Disabled: /
User-agent: Googlebot-Image
Disallow: /
And pages within this site are getting scanned by Googlebots all day long. Is there something wrong with my file or with Google?
...
Is there a way to configure the robots.txt so that the site accepts visits ONLY from Google, Yahoo! and MSN spiders?
...
I need to setup a maintenance page for a website I'm running, e.g. for display when I'm performing site maintenance (scheduled downtime) or if something really breaks and I need to put up a holding page.
Is there anything special I need to do to ensure that search engine crawlers don't index it and think that it's my site. Or should I d...
I have a robot that uses an optical mouse as a position track. Basically, as the robot moves it is able to track change in X and Y directions using the mouse. The mouse also tracks which direction you are moving - ie negative X or positive X. These values are summed into separate X and Y registers.
Now, the robot rotates in place and m...
Many years ago, just as I was getting started with programming, I ran into some programming games in the style of CRobots (I don't think it actually was CRobots, but a clone of sorts) which were pretty cool to play around with.
Recently I've gotten a feeling of "programming is work, not play", which I would rather get rid of, so I figur...
Hi All,
I have a web form which the users fill and the info send to server and stored on a database. I am worried that Robots might just fill in the form and I will end up with a database full of useless records. How can I prevent Robots from filling in my forms? I am thinking maybe something like Stackoverflow's robot detection, where ...
I'm doing a very rudimentary tracking of page views by logging url, referral codes, sessions, times etc but finding it's getting bombarded with robots (Google, Yahoo etc). I'm wondering what an effective way is to filter out or not log these statistics?
I've experimented with robot IP lists etc but this isn't foolproof.
Is there some k...
Hi,
This question sort of extends my other question on robots and captcha. I did what everyone recommend (thanks everyone!), however is it at all possible to detect a robot on the server first? For Example (Once again, I will use Stackoverflow as a reference): Sometimes when I ask a question, Stackoverflow comes back asking me to verify...
Is it common for robots to crawl inside iframes? And how do they do that? Do they index it instantly or do they just 'remember' the URL and continue sometimes later?
...
What is Robot Army Testing? Where is it used? How can I learn it?
...
Hello guys.
I have a few doubts about this robots file.
User-agent: *
Disallow: /administrator/
Disallow: /css/
Disallow: /func/
Disallow: /images/
Disallow: /inc/
Disallow: /js/
Disallow: /login/
Disallow: /recover/
Disallow: /Scripts/
Disallow: /store/com-handler/
Disallow: /store/img/
Disallow: /store/theme/
Disallow: /store/StoreSys...
Is there any cheap and very extensible robot kit, which can work with Microsoft Robotics?
I want to have a great choice of cool parts for a robot to buy. :)
If where is no such robot kit which can work with MS Robotics, is there any chance to buy a very extensible robot which just can be programmed, maybe even in assembler?
...
I am attempting to build a system that only shows users a CAPTCHA when bot-like behavior is detected. Here are the behaviors that I am currently looking for when somebody is filling out a contact form...
how quickly the form is submitted after the page loads (if its 5 seconds or less, its almost humanely impossible to fill out)
how man...
Hi all,
I'd like to write a little software which 'visits' a specific website regularly (every minute, for example) and gets specific data from there. This data is stored in a database which is used by another software I'm planning to write.
Is this legal or not? Do I need the permission from the website owner?
It's a complete open we...
Hi I'm a bit of a beginner at seo,
Could anyone tell me how I create and xml sitemap and robots.txt file for my site.
Is there some kind of generator for them?
Thanks for your help
Regards
Judi
...
hi sirs what's the best way to prevent google from showing of a folder in the search engine ?, like e.g www.example.com/support , what should i do if I want the support folder to disappear in google ?
the first thing I did was place a 'robots.txt' file and include this code
User-agent: *
Disallow: /support/etc
but the results is a tot...
My web site has a database lookup; filling out a CAPTCHA gives you 5 minutes of lookup time. There is also some custom code to detect any automated scripts. I do this as I don't want someone data mining my site.
The problem is that Google does not see the lookup results when it crawls my site. If someone is searching for a string that i...
Procedural techniques is common for texture synthesis, modeling plants, and modeling terrains. However, I've seen very little work on algorithmic construction of robots, which is a bit surprising given how mechanical these systems are.
Anyone have a good resource on the algorithmic construction of robots / robotic humanoids?
Thanks!
...