I just started thinking about creating/customizing a web crawler today, and know very little about web crawler/robot etiquette. A majority of the writings on etiquette I've found seem old and awkward, so I'd like to get some current (and practical) insights from the web developer community.
I want to use a crawler to walk over "the web...
Say I have a site on http://website.com. I would really like allowing bots to see the home page, but any other page need to blocked as it is pointless to spider. In other words
http://website.com & http://website.com/ should be allowed, but
http://website.com/anything and http://website.com/someendpoint.ashx should be blocked.
Further...
I want to write some app, that communicates with web application, and acts something like human user (BOT).
What programming language would you suggest to use?
Things that app have to do:
Send and receive information via http (GET and POST methods)
Ability to change any http field (User-Agent, Content-Type etc.).
Deal with received d...
What are some popular spam prevention methods besides CAPTCHA?
...
The reason I want to do this is to make it easy to parse out instructions that are emailed to a bot, the kind of thing majordomo might do to parse commands like subscribing and unsubscribing. It turns out there are a lot of crazy formats and things to deal with, like quoted text, distinguishing between header and body, etc.
A perl modu...
One of our next projects is supposed to be a MS Windows based game (written in C#, with a winform GUI and an integrated DirectX display-control) for a customer who wants to give away prizes to the best players. This project is meant to run for a couple of years, with championships, ladders, tournaments, player vs. player-action and so on...
When a user clicks a link to download a file on my website, they go to this PHP file which increments a download counter for that file and then header()-redirects them to the actual file. I suspect that bots are following the download link, however, so the number of downloads is inaccurate.
How do I let bots know that they shouldn't fo...
We are running Apache (IBM HTTP Server 6.0.2.0) in front of WebSphere 6.0 on linux. We are getting excessive traffic from a specific User-Agent from varying IP addresses. We do not want to block the User-Agent or IP addresses, but would like to slow them down a bit.
Best scenario for us would be to use out of the box Apache config op...
I am just finishing up an Artificial Intelligence course where, as part of the assignments, I was able to program bot in a multi-player environment (BZFlags).
What I was able to do was to program the bot to interface with the world and play capture the flag against other bots or even humans.
What I would like to know is, what other envi...
I am building stats for my users and dont wish the visits from bots to be counted.
Now I have a basic php with mysql increasing 1 each time the page is called.
But bots are also added to the count.
Does anyone can think of a way?
Mainly is just the major ones that mess things up. Google, Yahoo, Msn, etc.
...
Anyone know what the trend is with MMORPG developers encryption their client/server protocols these days?
The pro's and con's are as follows.
Encrypting protocol:
protects trade secrets regarding client/server protocol to a degree?
Botting isn't stopped, it is only changed because people will create bots which read screen states and ...
[update] I've accepted an answer, as lc deserves the bounty due to the well thought-out answer, but sadly, I believe we're stuck with our original worst case scenario: CAPTCHA everyone on purchase attempts of the crap. Short explanation: caching / web farms make it impossible for us to actually track hits, and any workaround (sending ...
Is there an easy way to create an IM bot on multiple im networks (aim, gtalk, yim, etc) that can accept and interpet specific commands sent to it to perform a server related task?
Lets say for instance I have a website for managing an rss feed. I want to send a command to an IM bot to add another feed to my collection. the IM bot would ...
Does anyone out there had a good tutorial or introduction to the various forms of botting? This includes botting for video games and web pages (like spiders, botting, and web scraping). I am not looking for how to do anything malicious, this is purely educational, so please do not link to anything that teaches anything harmful to anyone....
I'm building an e-commerce website with a large database of products. Of course, is nice when Goggle indexes all products of the website. But what if some competitor wants Web Scrap the website and get all images and product descriptions?
I was observing some websites with similar lists of products, and they place a CAPTCHA, so "only h...
The ways I can think of are:
Measure the time between actions.
Compare the posts' content (if they're too similar to each other) or, better yet, only the posted links.
Checking the distribution over a period of time the user is active (if the user is active, say posting once every hour, for a week, then either we have a superman ...
I'm getting several requests in web apps that are basically wrong in ways my code shouldn't be generating...
Mainly it's requests to .ashx without any GET parameters specified.
The user agent is "Mozilla/4.0" (nothing more than that)
The IPs vary from day to day.
This is a bot, right?
Thanks!
...
I'm trying to put together a fun 'contest' of sorts. Developers will write a bot that plays some game - maybe BlackJack and the master program will host the game and let the bots play against each other.
I've participated in such things before, but never been involved with the 'host' application. And I'm not sure how to go about doin...
Hi all,
I am constantly taking the following steps, and I know there's a way to automate this:
emailing photos from my phone to myself
saving those photos to my computer
uploading the saved photos to a website
Is there a way to write a script (perhaps in PHP), that does the following:
listens to any emails sent with
attachments to...
I assume it's bots, or something like them. We have forums on our website and daily we get 1000's of attempts to post spam. These never actually make it into the database, usually by throwing a ViewState or EventValidation exception. I'm not sure if I should even really be concerned. I'd really like to do something about these bots. ...