The logic of the requirements is complex enough to justify the use of Python instead of bash. It should provide a more readable, extensible, and maintainable solution.
#!/usr/bin/env python
import hashlib, os
def ishash(h, size):
"""Whether `h` looks like hash's hex digest."""
if len(h) == size:
try:
int(h, 16) # whether h is a hex number
return True
except ValueError:
return False
for root, dirs, files in os.walk("."):
dirs[:] = [d for d in dirs if not d.startswith(".")] # skip hidden dirs
for path in (os.path.join(root, f) for f in files if not f.startswith(".")):
suffix = hash_ = "." + hashlib.md5(open(path).read()).hexdigest()
hashsize = len(hash_) - 1
# extract old hash from the name; add/replace the hash if needed
barepath, ext = os.path.splitext(path) # ext may be empty
if not ishash(ext[1:], hashsize):
suffix += ext # add original extension
barepath, oldhash = os.path.splitext(barepath)
if not ishash(oldhash[1:], hashsize):
suffix = oldhash + suffix # preserve 2nd (not a hash) extension
else: # ext looks like a hash
oldhash = ext
if hash_ != oldhash: # replace old hash by new one
os.rename(path, barepath+suffix)
Here's a test directory tree. It contains:
- files without extension inside directories with a dot in their name
- filename which already has a hash in it (test on idempotency)
- filename with two extensions
- newlines in names
$ tree a
a
|-- b
| `-- c.d
| |-- f
| |-- f.ext1.ext2
| `-- g.d41d8cd98f00b204e9800998ecf8427e
|-- c.ext^Mnewline
| `-- f
`-- f^Jnewline.ext1
7 directories, 5 files
Result
$ tree a
a
|-- b
| `-- c.d
| |-- f.0bee89b07a248e27c83fc3d5951213c1
| |-- f.ext1.614dd0e977becb4c6f7fa99e64549b12.ext2
| `-- g.d41d8cd98f00b204e9800998ecf8427e
|-- c.ext^Mnewline
| `-- f.0bee89b07a248e27c83fc3d5951213c1
`-- f^Jnewline.b6fe8bb902ca1b80aaa632b776d77f83.ext1
7 directories, 5 files
The solution works correctly for all cases.
Whirlpool hash is not in Python's stdlib, but there are both pure Python and C extensions that support it e.g., python-mhash
.
To install it:
$ sudo apt-get install python-mhash
To use it:
import mhash
print mhash.MHASH(mhash.MHASH_WHIRLPOOL, "text to hash here").hexdigest()
Output:
cbdca4520cc5c131fc3a86109dd23fee2d7ff7be56636d398180178378944a4f41480b938608ae98da7eccbf39a4c79b83a8590c4cb1bace5bc638fc92b3e653
Invoking whirlpooldeep
in Python
from subprocess import PIPE, STDOUT, Popen
def getoutput(cmd):
return Popen(cmd, stdout=PIPE, stderr=STDOUT).communicate()[0]
hash_ = getoutput(["whirlpooldeep", "-q", path]).rstrip()
git
can provide with leverage for the problems that need to track set of files based on their hashes.