views:

57

answers:

1

i'm crawling an SNS with crawler written in python

it works for a long time, but few days ago, the webpages got from my severs were ERROR 403 FORBIDDEN.

i tried to change the cookie, change the browser, change the account, but all failed.

and it seems that are the forbidden severs are in the same network segment.

what can i do? steal someone else's ip? = =...

thx a lot

+1  A: 

Looks like you've been blacklisted at the router level in that subnet, perhaps because you (or somebody else in the subnet) was violating terms of use, robots.txt, max crawling frequency as specified in a site-map, or something like that.

The solution is not technical, but social: contact the webmaster, be properly apologetic, learn what exactly you (or one of your associates) had done wrong, convincingly promise to never do it again, apologize again until they remove the blacklisting. If you can give that webmaster any reason why they should want to let you crawl that site (e.g., your crawling feeds a search engine that will bring them traffic, or something like this), so much the better!-)

Alex Martelli
I'm tring to contact the wm...Social engineering, that's an idea...thx~
wdestinyx
@wdestinyx, well, I'm not suggesting social engineering in the sense of any pretense or manipulation (in which it's often used), just normal social conventions among humans (when one's done s/thing wrong one apologizes and promises to not to do it again, and the other party then forgives and avoids/stops taking further countermeasures, for example - "to err is human, to forgive divine" and all that;-).
Alex Martelli
@Alex Martelli, why i have a feel that u've blacklisted someguy crawling ur website~just a feel~i will talk to the webmaster sincerely, and check my politeness: )
wdestinyx
@wdest, I've actually never been a professional webmaster/sysadmin, but as a developer and manager of developers I've developed, and managed development of, code that webmasters and sysadms can use (in a rather automated way, as they can't spend all their waking hours manually blacklisting and de-blacklisting;-).
Alex Martelli