views:

54

answers:

4

I've wrote a script that will be used to release the new pages automatically at a particular time. It will just show a countdown timer and then when it reaches 0 it will rename a particular file into index.php and renames the current index.php to index-modified.php

There's no problem in this. But at some point time my customer told that my site is not coming.. I found that the index.php is renamed into index-modified.php and all other pages are working fine. And without index.php my site was showing 404 error.

Then i analyzed the access log and found the alexa crawler have accessed that release script and that caused the problem

I want to know how the alexa crawler had found my internal script file and crawled that?? Will it happen to all my internal admin purpose files? I dont have any links for that script at any of my pages.

I wonder how it could find the files that are present inside my server..????

+1  A: 

index.php is the default PHP script name in a directory. It will be executed when you navigate to the directory without giving a filename.

To solve this use POST to invoke the modifications. If you can't do that, then at least give the script a name that is unlikely to be guessed.

Ignacio Vazquez-Abrams
+1  A: 

You should use robots.txt and disallow spiders from crawling:

User-agent: *
Disallow: index.php
dusoft
+1  A: 

if you script is located within the htdocs (for apache) folder chances are the crawlers will find it and try to crawl it. What you can do is:

1) put a rule in robots.txt, here you can learn more about it : http://www.javascriptkit.com/howto/robots.shtml

This will advise crawlers not to execute the script, but won't forbid them to

2) put the script in a subfolder and protect it with a password - best in your case, REALLY what you don't want is random visitors or spiders to disable your web site. More about how to do that easy is .htaccess here:

http://www.javascriptkit.com/howto/htaccess3.shtml

Wish you best of luck, Marin

Ican Zilb
+7  A: 

I wonder how it could find the files that are present inside my server?

Probably because someone who accessed those files used the Alexa Toolbar

It only managed to do this because there are two things wrong with the script.

  1. It is not protected with an authentication/authorization layer.

  2. It makes a significant change on the server in response to a GET request. The HTTP spec provides GET for "safe" requests and POST for requests which do something.

David Dorward
+1 for the distiction between POST and GET
Martin Wickman
I think that must be the case, i have the alexa toolbar installed in my browser.
kvijayhari
I thought that like google and other search engines alexa would also crawl using links only.. Have to remember this... :)
kvijayhari