Since passwords are oft-times the easiest-to-attack part of cryptography it's actually sort of a real dictionary. The assumption is that people are lazy and choose proper words as password or construct passphrases out of them. The dictionary can include other things, though, such as commonly used non-words or letter/number combination. Essentially everything that is likely to be a poor-chosen password.
There are programs out there which will take an entire hard drive and build a dictionary out of every typable string on it on the assumption that the user's password was at some point in time put in plaintext into memory (and then into the pagefile) or that it simply exists in the corpus if text stored on the drive1:
Even so, none of this might actually matter. AccessData sells another program, Forensic Toolkit, that, among other things, scans a hard drive for every printable character string. It looks in documents, in the Registry, in e-mail, in swap files, in deleted space on the hard drive ... everywhere. And it creates a dictionary from that, and feeds it into PRTK.
And PRTK breaks more than 50 percent of passwords from this dictionary alone.
Actually, you can make dictionaries more effective even if you include knowledge on how people usually build passwords. Schneier talks about this lengthily1:
- Common word dictionary: 5,000 entries
- Names dictionary: 10,000 entries
- Comprehensive dictionary: 100,000 entries
- Phonetic pattern dictionary: 1/10,000 of an exhaustive character search
The phonetic pattern dictionary is interesting. It's not really a dictionary; it's a Markov-chain routine that generates pronounceable English-language strings of a given length. For example, PRTK can generate and test a dictionary of very pronounceable six-character strings, or just-barely pronounceable seven-character strings. They're working on generation routines for other languages.
PRTK also runs a four-character-string exhaustive search. It runs the dictionaries with lowercase (the most common), initial uppercase (the second most common), all uppercase and final uppercase. It runs the dictionaries with common substitutions: "$" for "s," "@" for "a," "1" for "l" and so on. Anything that's "leet speak" is included here, like "3" for "e."
The appendage dictionaries include things like:
- All two-digit combinations
- All dates from 1900 to 2006
- All three-digit combinations
- All single symbols
- All single digit, plus single symbol
- All two-symbol combinations
1 Bruce Schneier: Choosing Secure Passwords. In: Schneier on Security. (URL)