I am working on adding hash digest generating functionality to our code base. I wanted to use a String as a hash salt so that a pre-known key/passphrase could be prepended to whatever it was that needed to be hashed. Am I misunderstanding this concept?
You are understanding the concept perfectly. Just make sure the prepended salt is repeatable each and every time.
If I'm understanding you correctly, it sounds like you've got it right. The psuedocode for the process looks something like:
string saltedValue = plainTextValue + saltString;
// or string saltedalue = saltString + plainTextValue;
Hash(saltedValue);
The Salt just adds another level of complexity for people trying to get at your information.
And it's even better if the salt is different for each encrypted phrase since each salt requires its own rainbow table.
A salt is a random element which is added to the input of a cryptographic function, with the goal of impacting the processing and output in a distinct way upon each invocation. The salt, as opposed to a "key", is not meant to be confidential.
One century ago, cryptographic methods for encryption or authentication were "secret". Then, with the advent of computers, people realized that keeping a method completely secret was difficult, because this meant keeping software itself confidential. Something which is regularly written to a disk, or incarnated as some dedicated hardware, has trouble being kept confidential. So the researchers split the "method" into two distinct concepts: the algorithm (which is public and becomes software and hardware) and the key (a parameter to the algorithm, present in volatile RAM only during processing). The key concentrates the secret and is pure data. When the key is stored in the brain of a human being, it is often called a "password" because humans are better at memorizing words than bits.
Then the key itself was split later on. It turned out that, for proper cryptographic security, we needed two things: a confidential parameter, and a variable parameter. Basically, reusing the same key for distinct usages tends to create trouble; it often leaks information. In some cases (especially stream ciphers, but also for hashing passwords), it leaks too much and leads to successful attacks. So there is often a need for variability, something which changes every time the cryptographic method runs. Now the good part is that most of the time, variability and secret need not be merged. That is, we can separate the confidential from the variable. So the key was split into:
- the secret key, often called "the key";
- a variable element, usually chosen at random, and called "salt" or "IV" (as "Initial Value") depending on the algorithm type.
Only the key needs to be secret. The variable element needs to be known by all involved parties but it can be public. This is a blessing because sharing a secret key is difficult; systems used to distribute such a secret would find it expensive to accommodate a variable part which changes every time the algorithm runs.
In the context of storing hashed passwords, the explanation above becomes the following:
- "Reusing the key" means that two users happen to choose the same password. If passwords are simply hashed, then both users will get the same hash value, and this will show. Here is the leakage.
- Similarly, without a hash, an attacker could use precomputed tables for fast lookup; he could also attack thousands of passwords in parallel. This still uses the same leak, only in a way which demonstrates why this leak is bad.
- Salting means adding some variable data to the hash function input. That variable data is the salt. The point of the salt is that two distinct users should use, as much as possible, distinct salts. But password verifiers need to be able to recompute the same hash from the password, hence they must have access to the salt.
Since the salt must be accessible to verifiers but needs not be secret, it is customary to store the salt value along with the hash value. For instance, on a Linux system, I may use this command:
openssl passwd -1 -salt "zap" "blah"
This computes a hashed password, with the hash function MD5, suitable for usage in the /etc/password
or /etc/shadow
file, for the password "blah"
and the salt "zap"
(here, I choose the salt explicitly, but under practical conditions it should be selected randomly). The output is then:
$1$zap$t3KZajBWMA7dVxwut6y921
in which the dollar signs serve as separators. The initial "1"
identifies the hashing method (MD5). The salt is in there, in cleartext notation. The last part is the hash function output.
There is a specification (somewhere) on how the salt and password are sent as input to the hash function (at least in the glibc source code, possibly elsewhere).
Edit: in a "login-and-password" user authentication system, the "login" could act as a passable salt (two distinct users will have distinct logins) but this does not capture the situation of a given user changing his password (whether the new password is identical to an older password will leak).
Its worth mentioning that even though the salt should be different for each password usage, your salt should in NO WAY be computed FROM the password itself! This sort of thing has the practical upshot of completely invalidating your security.