views:

194

answers:

2

This problem might be relatively simple, but I'm given two text files. One text file contains all encrypted passwords encrypted via crypt.crypt in python. The other list contains over 400k+ normal dictionary words.

The assignment is that given 3 different functions which transform strings from their normal case to all different permutations of capitalizations, transforms a letter to a number (if it looks alike, e.g. G -> 6, B -> 8), and reverses a string. The thing is that given the 10 - 20 encrypted passwords in the password file, what is the most efficient way to get the fastest running solution in python to run those functions on dictionary word in the words file? It is given that all those words, when transformed in whatever way, will encrypt to a password in the password file.

Here is the function which checks if a given string, when encrypted, is the same as the encrypted password passed in:

def check_pass(plaintext,encrypted):
 crypted_pass = crypt.crypt(plaintext,encrypted)
 if crypted_pass == encrypted:
  return True
 else:
  return False

Thanks in advance.

+3  A: 

Without knowing details about the underlying hash algorithm and possible weaknesses of the algorithm all you can do is to run a brute-force attack trying all possible transformations of the words in your password list.

The only way to speed up such a brute-force attack is to get more powerful hardware and to split the task and run the cracker in parallel.

0xA3
As a slight meatspace optimization, it may be successful to try the words unaltered before you start altering them. People tend to use real words, and are far less likely to use permutations including numbers, etc. Of course, if this is homework, YMMV. ;)
Joseph Mastey
Yeah @JosephMastey, but this looks like homework, in which case, your assumption would not hold
inspectorG4dget
@inspectorG4dget Surely does, but let's pretend that the prof came up with a reasonably realistic dataset, just to make me feel better about my own CS degree.
Joseph Mastey
Thanks for the help, guys. I was able to successfully crack 90% of the passwords, but the other ones just took longer than 10 minutes, which is the maximum amount of time we get.
Luminance
@Luminance: How much longer than 10 minutes does it take? Maybe there is still some way to improve performance. Maybe you can get a profiler to check that.
0xA3
+1  A: 

On my slow laptop, crypt.crypt takes about 20 microseconds:

$ python -mtimeit -s'import crypt' 'crypt.crypt("foobar", "zappa")'
10000 loops, best of 3: 21.8 usec per loop

so, the brute force approach (really the only sensible one) is "kinda" feasible. By applying your transformation functions you'll get (ballpark estimate) about 100 transformed words per dictionary word (mostly from the capitalization changes), so, about 40 million transformed words out of your whole dictionary. At 20 microseconds each, that will take about 800 seconds, call it 15 minutes, for the effort of trying to crack one of the passwords that doesn't actually correspond to any of the variations; expected time about half that, to crack a password that does correspond.

So, if you have 10 passwords to crack, and they all do correspond to a transformed dictionary word, you should be done in an hour or two. Is that OK? Because there isn't much else you can do except distribute this embarassingly parallel problem over as many nodes and cores as you can grasp (oh, and, use a faster machine in the first place -- that might buy you perhaps a factor of two or thereabouts).

There is no deep optimization trick that you can add, so the general logic will be that of a triple-nested loop: one level loops over the encrypted passwords, one over the words in the dictionary, one over the variants of each dictionary word. There isn't much difference regarding how you nest things (except the loop on the variants must come within the loop on the words, for simplicity). I recommend encapsulating "give me all variants of this word" as a generator (for simplicity, not for speed) and otherwise minimizing the number of function calls (e.g. there is no reason to use that check_pass function since the inline code is just as clear, and will be microscopically faster).

Alex Martelli