views:

423

answers:

5

Hello,

I am trying to brute force a RAR archive which is protected by a password with 3 characters:

import os
Alphabets = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
for a in range(0,26):
 for b in range(0,26):
  for c in range(0,26):
   Brute = Alphabets[a] + Alphabets[b] + Alphabets[c]
   os.popen4("Rar.exe x -p" + Brute + " Protected.rar")
#   raw_input()
raw_input("Done !")

The code works fine, except: it is very slow !!

i think what makes it slow is the multi-opening by "popen4". because i tried to stored the generated words in a txt file, and the program finished in less than 5 seconds.

Any ideas to increase the performance?

+5  A: 

you might consider using some stdlib modules:

>>> import string
>>> import itertools
>>> from subprocess import Popen, PIPE
>>> for i in itertools.product(string.ascii_uppercase, repeat=3):
    pr = Popen(['rar.exe', 'x', '-p', ''.join(i), 'protected.rar'], stdin=PIPE, stdout=PIPE)
    pr.communicate()

It might not necessarily improve performance, but it does make your code cleaner.

SilentGhost
itertools.permutations wont't give you all possible 3-character passwords. You want itertools.product(*[string.uppercase]*3).Also, there's no function called os.join; the closest is os.path.join, which does something completely different. Use ''.join(i) instead.Finally, none of this is relevant since the questioner's problem is performance, and the time taken by the Python script itself is negligible.
David
@David: it is `product` indeed, thanks for that. I would argue that my code provides all possible performance improvements that could be made to the original code, whether they're negligible or not is not really relevant.
SilentGhost
+7  A: 

You could use (or learn from) rarcrack. It is written in C and compiles without problems on Linux (Windows with lots of changes).

In general, opening a process for every single tested password is very expensive. You should try and open the archive yourself, and then test against all passwords. Anyway you need to test the return value of rar.exe to find out whether extraction succeeded.

For best performance, you should write the program in C (or similar). There's a Linux package called "libunrar" that might help you with opening RAR files.

AndiDog
+2  A: 

The generating of the passwords is trivial, that's why it takes only 5 seconds to create the 26^3 = 17576 passwords. What takes the most time is opening and attempting to decrypt the archive - and you don't have control over that.

There isn't much you can do about speeding this up - the rar binary and the input file will be cached in memory after the first few tries: just let it run overnight or over the week-end as need be.

florin
Taking almost 5 seconds to generate a 100k text file seems kind of slow to me.
Mike DeSimone
A: 

What about generating the passwords first an then parallelize the rar.exe process call(which seems to be the bottleneck)?

maxwell
looks like a good idea, can you show a small example ?
Goblin
A: 

You may not be able to cut down on the time it takes to make the attempt to decrypt the archive, but, assuming that the password is not completely random (which it may be), you may get to the correct password more quickly if you order the letters in decreasing likelihood of use.

For example, in Linux Journal, the shell script column analyzed a few large texts to determine that e, t, a, o, n, i, s, r, h, and d were the most common letters in those texts (and presumably this is close to English as a whole). So changing your second line to: Alphabets = "ETAONIBSRHDCFGJKLMPQUVWXYZ" could cause your algorithm to arrive at the password in fewer iterations.

Edit: Second thoughts If the password is, as someone indicated, "cat", the original ordering will require 3 passes through the outer loop, whereas the new version will require 11 pass, so in this case it won't solve it faster. So maybe you need to optimize the list for the outer loop by trying to predict the most likely first letter.

Robert Gowland