views:

186

answers:

8

I'm slowly but surely working on a site using ASP.NET MVC which will contain a bunch of user-generated content, such as submitted links, comments, markers on a map etc.

I need to know what stuff I need to be mindful of when coding.

Examples being scrubbing User inputs for HTML strings like the script tag, and blocking against SQL injection attacks.

What other key things do I need to be cautious of when considering user input?

Also, are there any readily available algorithms to scrub (and deny) inputs for rude words / porn links etc?

Additionally, what's the best way to do moderation? I'm a lone-developer and doing this in my spare time so I can't afford to be a full-time moderator, and I'd prefer to implement self-moderation for the users, does this work well?

If I was to want a forum, how important is it that I moderate forum posts, and can they be self-moderating?

How much professional moderation is required in StackOverflow for example?

Thanks for any input

+3  A: 

If you haven't already, listen to the Stack overflow podcast from the beginning. It features a lot of good discussion about how Jeff and Joel came to the design of this particular site, including security, moderation, scalability and many other interesting subjects.

Rik
A: 

Rate limiting and botting come to mind. You want to make sure that one particular user doesn't have the ability to flood your site, and you also want to take some steps to make it harder for bots. It doesn't sound like the kind of site that bots would ever necessarily target, but why take the risk?

dirtside
How do you go about implementing that kind of protection, can you recommend any good resources?
Matthew Rathbone
A: 

Self-moderation is best for non-technical reasons.

Ok, I read this about 15 years ago, so the ruling may have been overturned since, but when websites were first being sued for the slanderous user-content posted on them, the legal ruling a the time was that if the owners of the website exercised any editorial control over the content at all, then they took responsibility for all the content that was on their site, regardless of who write it originally. Hence the safest thing to do legally, is to pass all moderation duties on to your users.

James Curran
I believe the legal term at the time was Common Carrier. I don't know if it still applies or not, though. (Look at all the copyright suits against places like YouTube, etc.)
John Rudy
A: 

You may want to do what you can to prevent bot signups (along with with what @dirtside said about bots flooding), by using captchas for instance.

While I know captchas aren't a perfect solution by any means, I've kind of been waiting for a reason to incorporate Recaptcha (http://recaptcha.net/) into a project which involves user signups. Looks about as cool a captcha could be and seems like a good cause to boot.

As far as a bad word list goes, some googling revealed this one: http://www.noswearing.com/index.php - which claims to have an API coming soon. Censorship is a tricky subject though, and it may or may not be difficult to decide on what you or your users consider acceptable vs bad words. Personally, I'd be a little weary of using a third party list for this sort of thing, and creating my own list doesn't sound like a fun task either.

mmacaulay
A: 

You should whitelist instead of blacklist user input simply because the latter requires you to defend against everything anyone can think of instead of. Blacklisting means covering for every obscure hack there is. Simon Wilson recently made a good slideshow avaliable on some obscure ways pages can break: http://simonwillison.net/2008/talks/amajax-security/

I hope that helps.

Kit Sunde
A: 

Also don't forget JavaScript injection while your doing SQL injection prevention.

Bob King
A: 

In these kind of sites where you allow user to create/upload the content, it is very difficult to do moderation. Self moderation or moderation by a staff can catch some things, but many a times such content will slip by. If you allow forum feature, there is a guarantee to have rude/racist and pornographic nature of discussions.

StackOverflow is a site for very specific group of users. But if you open it to normal public, it may also be abused.

You should either allow very legitimate users to login (as StackOverflow is using OpenID) or should drop this feature from your "business requirement".

Pradeep
A: 

Sorry for the copy and paste, but I would recommend sanitizing your user input through a service like this: HTML Whitelist is the latest in the "cool little Python Web service thrown up on App Engine" by my good colleague DeWitt Clinton.

It does one thing, and it does it well. You can pass the service HTML and it will return a sanitized version.

http://html-whitelist.appspot.com/

Much easier than trying to write it from scratch yourself and keep updated.