views:

84

answers:

7

I currently run several research-related web-sites with active users, and these sites use some personally identifying information about these users (their email address, IP address, and query history). Ideally I'd release the code to these sites as open source, so that other people could easily run similar sites, and more importantly scrutinise and replicate my work, but I haven't been comfortable doing so, since I'm unsure of the security implications. For example, I wouldn't want my users' details to be accessed or distributed by a third party who found some flaw in my site, something which might be easy to do with full source access.

I've tried going half-way by refactoring the (Django) site into more independent modules, and releasing those, but this is very time consuming, and in practice I've never gotten around to releasing enough that a third party can replicate the site(s) easily. I also feel that maybe I'm kidding myself, and that this process is really no different to releasing the full source.

What would you recommend in cases like this? Would you open-source the site and take the risk? As an alternative, would you advertise the source as "available upon request" to other researchers, so that you at least know who has the code? Or would you just apologise to them and keep it closed in order to protect users?

+1  A: 

Don't worry about allowing others to replicate the site easily. People that want to will do so regardless of whether or not you're involved. Release clean, useful, general-purpose chunks of code (apps), and allow others to use them as they wish.

Ignacio Vazquez-Abrams
+1  A: 

There's no reason you have to go to a completely open release out the door. Why not pass the exact code around some trusted friends with pentesting skills, first?

Most of us have that mischevious hacker friend in our peer group, take advantage of that!

scott_karana
+6  A: 

Your question basically boils down to -- how effective is security through obscurity. It does provide a measure of security. That is, by open-sourcing the code to your site, yes, you do increase the risk that someone might find a security flaw in it and would exploit it.

You'd need to weigh that possible negative against the possible positives that you'd gain from releasing the code. For example: possible increased exposure to your website, contributions to your code by other people that improve your site functionality and security, improving the world, etc, etc.

You should also consider how likely of a target you are. Is your website high-profile? What is the most valuable information that a hacker might hope to steal? (IP information is generally useless to hackers, email addresses a tiny bit more valuable, and so on). In general, if your website is of enough a high profile that someone wants to break into it, they're already trying. If it's not, releasing the source code won't make someone who wasn't interested in breaking in before suddenly try to break in now. But you are lowering the bar to hackers by releasing it, so you do need to weigh the risks.

If you are worried about it, then making it available only by email request to other researchers would certainly cut down the risk as well.

RarrRarrRarr
+1: I'd second the statement `consider how likely of a target you are`
KMan
I'd say I'm not much of a target now, since my sites are not high volume by web standards and the data they keep would not be that valuable to attackers. Because of this, security through obscurity has worked for me (as far as I know).I realise now my question was: "Will releasing the source make me a target?" The answer seems to be, "probably not". They need motivation to want to break the site, and whilst having source is enabling, it doesn't really provide or increase that motivation.
Lars Yencken
+1  A: 

The problem here is, as I'm sure you already know, you cannot really predict the nature of the potential threat should the code be misused.

How resilient the site would be with the code exposed is really a matter of how secure your app is in general - just because you're supplying the code, you don't necessarily need to release ALL the configuration aspects of your own actual site. For instance, you don't need to give them your actual database usernames and passwords, or encryption keys, or whatever...

Of course, the source alone would be a great aid for anyone looking to use cross site scripting against you - but there are benefits to be realised as well - scrutuniy and constructive feedback from other developers for example.

If you're only a low profile site, I would be to give tempted to it a whirl. The idea of only releasing the code on request via email that someone else has just posted is also a good idea. You know who has the code, and have the opportunity to form a constructive relationship with them.

Hope it goes well.

Martin

ps. And my applause for your attitude.

Martin Milan
+2  A: 

As Rarr pointed out, you now have security through obscurity, which is never enough. You have to remember that even if you decide to keep your source code closed, it doesn't mean that your site is any more or less secure (this means that the vulnerabilities exist in the code an are not dependent whether or not the code is published). If you release your code, then you might have an increased risk of getting attacked, since a hacker can review your code and find flaws.

I can almost guarantee you, that your application has some security flaws. I recall reading about a research performed by OWASP, which indicated that more than 96% of all web applications had some degree of vulnerabilities within them (injection, xss, information leakage, elevated access rights, insufficient transport layer protection, etc). Unfortunately, I was unable to find the actual results at this point.

I think you can go ahead and publish your code as open source if you want to. However, before doing so, I recommend you do some security testing to make sure that at least some of the most basic aspects are covered. This will decrease the risk of your site getting attacked successfully.

  • Make sure you don't have SQL injection vulnerabilities
  • XSS attacks are very common today and they are hard to detect and countermeasure. Make at least sure all your user input data (anything sent from the browser to the server) is validated before using in application logic
  • Make sure your configuration details are outside the web root
  • Handle your users data with care. This means that their passwords should be encrypted with a one-way algorithm (hash) using a salt value. Other sensitive user information should be encrypted before storing (for example, email addresses as you mentioned).
  • Have security policies regarding usernames and passwords (minimum lengths, passwords must contain alphanumeric AND non-alphanumeric characters, etc). Weak passwords can be attacked with dictionary attacks or with brute force
  • Don't return detailed error messages (for example, if the authentication of a user failed, just give an error message saying the authentication failed, don't tell the user if it was the username or the password which was wrong. Detailed error messages can be used for account harvesting.
  • Don't allow more than N number of invalid login attempts (see brute force attacks)

The list is by no means complete, on the contrary, it's just a start, but at least you have something to start from. If you are interested in security testing, I recommend your read the OWASP testing guide.

Kim L
I'm likewise sure that my site has flaws. What's more, as a research project that I've largely moved on from, I'll never be able to invest the resources to audit it fully. Perhaps this is another reason it should be open-sourced though; someone else with fresh energy might well create a better user experience by starting where I've left off, and likewise better security.Thanks for the suggestions.
Lars Yencken
+1  A: 

You might want to do a code audit of your website along with some web-site scanners such as skipfish and rat proxy, both made by the same gentleman.

Nitrodist
Thanks for the tool suggestions, I'll give them a go.
Lars Yencken
+1  A: 

I have been in precisely this situation. You have a large body of code that is hard to attack because it is not public. (It is dramatically harder to attack code that is not open source than code that is open source.) You would like to open source what you have, but there are people using it.

You are going about this the correct way. Move as much of your website as possible to an established platform, put your functionality in modules, and slowly evaluate and release the modules. This is the correct approach for several reasons:

  1. Many more people have looked at the code base you are moving to than have looked at your code base, so you are moving to a more secure system.
  2. Once you make the migration, your underlying system will get better without your doing anything. (This is why I eventually gave up on my system and moved to an open source system---not out of commitment to open source, but because I liked the fact that it got better when I was watching movies.)
  3. More people are likely to use your code if you make it a module to a system that is already in use.

Another part of your question, though, has to do with the content that your website has. You said that there is a lot of identifying information, such as email address, IP address, and query history.

Why are you keeping this information?

You don't need to keep email addresses unless you are actively sending out email. (You can do password reset by keeping a hash of the email address; when somebody enters their email address to do a password reset, you compute the hash and search against all of the existing hashes to see if there is a match). - You don't need to keep IP addresses (you can hash that with the same results). - Search history is harder to store in a secure fashion, but why are you storing it? You could store it client-side in cookies. Or you could have it be stored with a hash of the username and the password, so there is no obvious way to match up a particular search history with a user. These are

vy32
I never thought of hashing IP addresses for privacy, good idea! We only keep IP addresses to thread together sequential queries into a more coherent log, but now I see we can avoid storing them at all.For the other site, email address is necessary since we need some way to verify the user before creating expensive user models. All in all nice suggestions though.
Lars Yencken
You don't need to store the email address to verify it. The user enters the email address every time they come to your website. You store the hash of the email address. When the user enters their email, you hash it and use that for the lookup, then tie the user to the hashed email with your session identifier.
vy32