bots

Writing a simple web crawler that interacts with the browser (Java)

Hi guys, I need to create an automated process (preferably using Java) that will: Open browser with specific url. Login, using the username and password specified. Follow one of the links on the page. Refresh the browser. Log out. This is basically done to gather some statistics for analysis. Every time a user follows the link a bun...

how to avoid bot attacks on form

i have these forms: https://www.mychabad.org/templates/articlecco.asp?aid=1188756&jewish=General-Contributions.htm&lang=en&site=chabaduc.org https://www.mychabad.org/templates/articlecco.asp?AID=1189379 https://www.mychabad.org/templates/articlecco.asp?aid=1189287&jewish=Shabbat-Holiday-Sponsorships.htm&lang=en&amp...

[Open] Negative Captchas - help me understand spam bots better

I have to decide a technique to prevent spam bots from registering my site. In this question I am mainly asking about negative captchas. I came to know about many weaknesses of bots but want to know more. I read somewhere that majority of bots do not render/support javascript. Why is it so? How do I test that the visiting program can'...

Should I block bot*?

Hello, Bandwidth on one of our sites was severely messed with on the 28th of this month. The cpanel only tracks daily access logs and didnt archive them(it does now), using aw stats I found our bot traffic to be as follows: Unknown robot (identified by 'bot*') 91541+417 4.78 GB 28 Jul 2010 - 07:12 I have blocked bot* using htaccess: ...

Cheking if a webpage is online (best practices for making a Uptime Bot)

Hello, im making a investigation and i need to check every 15 min for about 7 days if a web site is online. I have the url and high programing skills in VB6 and PHP, and some ideas of how to do that (like making a ping to the port 80 of the url), but due the important of this investigation i need recomendation from professionals, so if y...

Is HTML Email Obfuscation safe enough to stop bots?

I know that most javascript email obfuscation solutions stop bots dead in their tracks - but sometimes it's hard to use/insert javascript in places. To that end I was wondering if anyone knew if the bots were smart enough to translate HTML entities in HEX and DEC into valid email strings? For example, lets say I have a function that ra...

HEAD request receives "403 forbidden" while GET "200 ok"?

Hello, after several months having the site disappear from search results in every major search engine, I finally found out a possible reason. I used WebBug to investigate server header. See the difference if the request is HEAD or GET. HEAD Sent data: HEAD / HTTP/1.1 Host: www.attu.it Connection: close Accept: */* User-Agent: WebBug/...

Counting words on a html web page using php

I need a PHP script which takes a URL of a web page and then echoes how many times a word is mentioned. Example This is a generic HTML page: <html> <body> <h1> This is the title </h1> <p> some description text here, <b>this</b> is a word. </p> </body> </html> This will be the PHP script: <?php htmlurl="generichtml.com"; the script ...

XMPP bot status message on GAE

Hi , GAE XMPP documentation states that is not possible to set status message for an app ( https://code.google.com/appengine/docs/python/xmpp/overview.html#Google_Talk_User_Status ). On other hand, I've vark IM client has status message set. Obviously it is not hosted on GAE, but it is possible to set status message for app. I hav...

How to ignore pageviews by search engines on a PHP page?

I have a particular PHP page that I want to conditionally do things only if the visitor is not a search engine. Are there some good regex to match $_SERVER['HTTP_USER_AGENT']? Or would it be better to do a javascript redirect back to the page but set a flag, since search engines don't have javascript? (I don't have to worry much about m...

Are there spam concerns when using the address tag?

I know that spam bots scour web sites and harvest emails, however I wasn't sure about the extent of information that they search for (for instance, names, physical addresses, phone numbers, etc.) In essence, my question boils down to: "Do spam bots search web pages for physical addresses, and I am helping them through the use of the <a...

How to make a bot to navigate a site?

Given a product id, associates have to navigate a vendors website, log in, perform a search, in order to get details on a product for a customer. My employers want a program that can use the product id, and navigate the vendors website, and perform the search and everything to get the information thus saving the associate from having to...

Keyword based Chatterbot in a Web Application

I'm trying to create a keyword based Chatterbot on the web. Simply look for keywords in an input and return relevant responses. Example: User(Input): What is your phone number? Bot(Output): 555-555-5555 This would occur due to the presence of the the keyword "phone" or "number". You could create a database of keywords: Output Strin...

Web log file analysis software to measure search crawlers

I need to analyze the search engine crawling going on in my site. Is there a good tool for this? I've tried AWStats and Sawmill. But both of those give me very limited insight into the crawling. I need to know information like how many unique/distinct webpages in a section of my site was crawled by a specific crawler within a time period...

BOT/Spider Trap Ideas

I have a client whose domain seems to be getting hit pretty hard by what appears to be a DDoS. In the logs it's normal looking user agents with random IPs but they're flipping through pages too fast to be human. They also don't appear to be requesting any images. I can't seem to find any pattern and my suspicion is it's a fleet of Window...

Jabber bot - how to get the availability of contacts?

I need to set up a jabber bot, using python, that will send messages based on the online/offline availability of several contacts. I've been looking into pyxmpp and xmpppy, but couldn't find any way (at least nothing straightforward) so check the status of a given contact. Any pointers on how to achieve this? Ideally I would like some...

Is there a chatbot framework available?

I am trying to create an program similar to ELIZA. My preference is to implement this project in a general language such as ruby, java, C++. is there some framework (open source would be great) available for any of these languages ? ...

Checkbox as alternative to captcha?

Does a checbox provide an alternative to using captcha on website? I am thinking i i need to use captcha for user signup. Instead if i put a checkbox for the terms like "By clicking here I agree...." can that solve the bots issues or is captcha required in addition to the checkbox? ...

What's up with Facebook policies vs. graph.facebook.com/robots.txt ?

Facebook's developer principles and policies and the general terms of use seem to forbid automated data collection, but graph.facebook.com/robots.txt seems to allow it: User-agent: * Disallow: Does anybody know how to make sense of this? ...