tags:

views:

142

answers:

5

Hello,

i have a list of lists that looks like this:

dupe = [['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'apa.txt'], ['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt'], ['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'apa2.txt'], ['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'jude.txt']]

I write it to a file using a very basic function():

try:
    file_name = open("dupe.txt", "w")
except IOError:
    pass

for a in range (len(dupe)):
    file_name.write(dupe[a][0] + " " + dupe[a][1] + " " + dupe[a][2] + "\n");

file_name.close()

With the output in the file looking like this:

95d1543adea47e88923c3d4ad56e9f65c2b40c76 ron\c apa.txt
95d1543adea47e88923c3d4ad56e9f65c2b40c76 ron\c knark.txt
b5cc17d3a35877ca8b76f0b2e07497039c250696 ron\a apa2.txt
b5cc17d3a35877ca8b76f0b2e07497039c250696 ron\a jude.txt

However, how can i make the output in the dupe.txt file to look like this:

95d1543adea47e88923c3d4ad56e9f65c2b40c76 ron\c apa.txt, knark.txt
b5cc17d3a35877ca8b76f0b2e07497039c250696 ron\a apa2.txt, jude.txt
A: 

If this is your actual answer, you can:

  1. Output one line per every two elements in dupe. This is easier. Or,
  2. If your data isn't as structured (so you may you can make a dictionary where your long hash is the key, and the tail end of the string is your output. Make sense?

In idea one, mean that you can something like this:

tmp_string = "" 
for a in range (len(dupe)):
if isOdd(a):
    tmp_string = dupe[a][0] + " " + dupe[a][1] + " " + dupe[a][2]
else:
    tmp_string += ", " + dupe[a][2]
    file_name.write(dupe[a][0] + " " + dupe[a][1] + " " + dupe[a][2] + "\n");

In idea two, you may have something like this:

x=dict()
for a in range(len(dupe)):
    # check if the hash exists in x; bad syntax - I dunno "exists?" syntax
    if (exists(x[dupe[a][0]])): 
        x[a] += "," + dupe[a][2]
    else:
        x[a] = dupe[a][0] + " " + dupe[a][1] + " " + dupe[a][2]
for b in x: # bad syntax: basically, for every key in dictionary x
    file_name.write(x[b]);
montooner
A: 

Use a dict to group them:

data = [['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'apa.txt'], \
    ['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt'], \
    ['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'apa2.txt'], \
    ['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'jude.txt']]

dupes = {}
for row in data:
    if dupes.has_key(row[0]):
        dupes[row[0]].append(row)
    else:
        dupes[row[0]] = [row]

for dupe in dupes.itervalues():
    print "%s\t%s\t%s" % (dupe[0][0], dupe[0][1], ",".join([x[2] for x in dupe]))
Emil H
You don't need to escape newlines when you have an unmatched (, [, or {. It's even a Python style guideline that you prefer this to newline escapes.
Roger Pate
Thanks. I'll keep that in mind. :)
Emil H
+1  A: 

i take it your last question didn't solve your problem?

instead of putting each list with repeating ID's and directories in seperate lists, why not make the file element of the list another sub list which contains all the files which have the same id and directory.

so dupe would look like this:

dupe = [['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', ['apa.txt','knark.txt']],
['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', ['apa2.txt','jude.txt']]

then your print loop could be similar to:

for i in dupe:
   print i[0], i[1],
   for j in i[2]
      print j,
   print
Victor
+2  A: 

First, group the lines by the "key" (the first two elements of each array):

dupedict = {}
for a, b, c in dupe:
  dupedict.setdefault((a,b),[]).append(c)

Then print it out:

for key, values in dupedict.iteritems():
  print ' '.join(key), ', '.join(values)
chrispy
+1  A: 
from collections import defaultdict

dupe = [
  ['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'apa.txt'],
  ['95d1543adea47e88923c3d4ad56e9f65c2b40c76', 'ron\\c', 'knark.txt'],
  ['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'apa2.txt'],
  ['b5cc17d3a35877ca8b76f0b2e07497039c250696', 'ron\\a', 'jude.txt'],
]
with open("dupe.txt", "w") as f:
  data = defaultdict(list)
  for hash, dir, fn in dupe:
    data[(hash, dir)].append(fn)
  for hash_dir, fns in data.items():
    f.write("{0[0]} {0[1]} {1}\n".format(hash_dir, ', '.join(fns)))
Roger Pate