Best practices for URL retrieval service? How to avoid being attack vector? | ansaurus

tags:

views:

92

answers:

1

+4 Q:

Best practices for URL retrieval service? How to avoid being attack vector?

I'm tinkering with a web tool that, given a URL, will retrieve the text and give the user some statistics on the content.

I'm worried that giving users a way to initiate a GET request from my box to any arbitrary URL on the net may serve as a vector for attacks (e.g. to http://undefended.box/broken-sw/admin?do_something_bad).

Are there ways to minimize this risk? Any best practices when offering public URL retrieval capacity?

Some ideas I've thought about:

honoring robots.txt
accepting or rejecting only certain URL patterns
checking blacklist/whitelist of appropriate sites (if such a thing exists)
working through a well known 3rd party's public web proxy, on the assumption that they've already built in these safeguards

Thanks for your help.

Edit: It'll be evaluating only HTML or text content, without downloading or evaluating linked scripts, images, etc. If HTML, I'll be using an HTML parser.

+2 A:

Are the statistics going to be only about the text in the document? Are you going to evaluate it using a HTML parser?

If it's only the text that you're going to analyze, that is, without downloading further links, evaluating scripts, etc. then the risk is less severe.

It probably wouldn't hurt to pass each file you download through an Anti-Virus program. You should also restrict the GETs to certain content-types (i.e. don't download binaries; make sure it's some sort of text encoding).

Assaf Lavie 2009-03-01 19:57:48

related questions

How to prevent Iframe hack

Integrating authentication between a web app and desktop app

ASP.NET Authenticaion and Security with Session

Is this acceptable for passing a password?

Why can't an iframe set its parent's location.hash?

Anyone know of a free XSS penetration testing tool?

AntiSpam measures on websites

Implementing secure, unique "single-use" activation URLs in ASP.NET (C#)

Confusion with services and certificates with an anonymous client

Help me understand rails authentication w/r/t assets, like swfs.

Is EnableHeaderChecking=true enough to prevent Http Header Injection attacks?

Examples of SQL Injections through addslashes()?

How can I prevent bulk vulnerability scanning without using a CAPTCHA component?

What security issues appear when users can upload their own files?

Do Perl CGI programs have a buffer overflow or script vulnerability for HTML contact forms?

How to Start/Stop a Windows Service from an ASP.NET app - Security issues

How can I ensure that my web pages are not modified by end customer?

What is the best and safest way to store user email addresses in the database?

Main security concerns in allowing users embed video

Returning a password to the web user

What conferences/training should an employer provide for a java/web team?

ASP.Net: Authentication via Browser's Login Window

Do you require deep packet inspection on a server-only firewall?

Storing parts of user data in files for preventing SQL injection