If the paths all look like in your example, this would work:
counts = {}
for p in paths:
parts = p.split('/')
branch = counts
for part in parts[1:-1]:
branch = branch.setdefault(part, {})
branch[parts[-1]] = 1 + branch.get(parts[-1], 0)
This uses dictionary methods like setdefault()
and get()
to avoid having to write a lot of if-statements.
Note that this will not work if a path that has subdirectories can also appear on it's own. Then it's not clear if the corresponding part of counts
should contain a number or another dictionary. In this case it would probably be best to store both a count and a dict for each node, using a tuple or a custom class.
The basic algorithm stays the same:
class Stats(object):
def __init__(self):
self.count = 0
self.subdirs = {}
counts = Stats()
for p in paths:
parts = p.split('/')
branch = counts
for part in parts[1:]:
branch = branch.subdirs.setdefault(part, Stats())
branch.count += 1
With some pretty printing you get:
def printstats(stats, indent=''):
print indent + str(stats.count) + ' times'
for (d, s) in stats.subdirs.items():
print indent + d + ':'
printstats(s, indent + ' ')
>>> printstats(counts)
0 times
a:
0 times
c:
0 times
d:
1 times
b:
0 times
c:
2 times
d:
1 times
...