views:

149

answers:

6

How would you go about hiding sensitive information from going into log files? Yes, you can consciously choose not to log sensitive bits of information in the first place, but there can be general cases where you blindly log error messages upon failures or trace messages while investigating a problem etc. and end up with sensitive information landing in your log files.

For example, you could be trying to insert an order record that contains the credit card number of a customer into the database. Upon a database failure, you may want to log the SQL statement that was just executed. You would then end up with the credit card number of the customer in a log file.

Is there a design paradigm that can be employed to "tag" certain bits of information as sensitive so that a generic logging pipeline can filter them out?

+3  A: 

I would personally regard the log files themselves as sensitive information and make sure to restrict access to them.

Fredrik Mörk
True! I'm thinking of cases where you're a software vendor and asking your clients to send you the log files from their system in order to diagnose a system crash etc. Would the onus be on the client to first clean up their log files from sensitive information? Wouldn't it be nice if your system had a way to let clients get that for free?
Ates Goral
"Restricting access" isn't specific enough to provide sufficient protection for credit card information. The logs need to be encrypted, and access to the decryption keys needs to be spelled out in the security policy.
erickson
+1  A: 

In your example, you should be encrypting the credit card number or, better yet, not even storing it in the first place.

If, say, you were logging something else, like a login, you might want to explicitly replace a password with *.

However, this manages to neatly avoid answering the question you've posed in the first place. In general, when dealing with sensitive information, it should be encrypted on its way to any form of permanent storage, be it a database file or a log file. Assume that a Bad Guy is going to be able to get their hands on either, and protect the information accordingly.

Bob Kaufman
I think encryption can be the answer: as soon as sensitive info enters your system, it gets encrypted and lives as encrypted. So, if you're doing low level logging (semantics-agnostic) or even getting a memory dump, the information will be reasonably secure. I think I like the idea of encrypting the info instead of the entire log file as suggested in other answers.
Ates Goral
+1  A: 

If you know what you're trying to filter, you may run you log output through a Regex cleaning expression before you log it.

Esteban Araya
Yes, I thought of that. In fact, this may be a viable solution since there will always be a discreet number of different types of "sensitive" strings which you can identify with regexes.
Ates Goral
+1  A: 

Logging a credit card number could be a PCI violation. And if you aren't PCI compliant, you will be charged higher card-processing fees. Either don't log sensitive information, or encrypt your entire log files.

Your idea of "tagging" sensitive information is intriguing. You could have a special data type for Sensitive information, that wrapped the real, underlying data type. Whenever this object is rendered as a character string, it just returns "***" or whatever.

However, this could require widespread coding changes, and requires a level of concious vigilance similar to that needed to avoid logging sensitive information in the first place.

erickson
+1  A: 

Regarding SQL statements specifically, if your language supports it, you should be using parameters instead of putting values in the statement itself. In other words:

select * from customers where credit_card = ?

Then set the parameter to the credit card number.

Of course, if you plan to log SQL statements with parameters filled in, you'd need some other way to filter out sensitive data.

Adam Crume
True. But this only covers SQL statements.
Ates Goral
That's why I prefaced it with "Regarding SQL statements specifically". It wasn't intended to be 100% general.
Adam Crume
Noted. I was actually looking for more of a silver bullet solution instead of a case-by-case analysis, but thanks for this answer!
Ates Goral
+2  A: 

My current practice for the case in question is to log a hash of such sensitive information. This enables us to identify log records that belong to a specific claim (for example a specific credit-card number) but does not give anybody the power to just grab the logs and use the sensitive information for their evil purposes.

Of course, doing this consistently involves good coding practices. I usually choose to log all objects using their toString overloads (in Java or .NET) which serializes the hash of the values for fields marked with a Sensitive attribute applied to them.

Of course, SQL strings are more problematic, but we rely more on our ORM for data persistence and log the state of the system at various stages then log SQL queries, thus it is becomes a non-issue.

paracycle
I like the idea of overriding toSource; that way, the logging code doesn't have to care about what's being logged. Although it doesn't address the issue with memory dumps, I'll accept this as the best answer since it's directly answering the original question. Thanks!
Ates Goral