Hello, I'm stumped with a problem illustrated in the sample below:
"ID","NAME","PHONE","REF","DISCARD"
1,"JOHN",12345,,
2,"PETER",6232,,
3,"JON",12345,,
4,"PETERSON",6232,,
5,"ALEX",7854,,
6,"JON",12345,,
I want to detect duplicates in column "PHONE", and mark the subsequent duplicates using the column "REF", with a value pointing to the "ID" of the first item and the value "Yes" for the "DISCARD" column
"ID","NAME","PHONE","REF","DISCARD"
1,"JOHN",12345,1,
2,"PETER",6232,2,
3,"JON",12345,1,"Yes"
4,"PETERSON",6232,2,"Yes"
5,"ALEX",7854,,
6,"JON",12345,1,"Yes"
So, how do I go about it? I tried this code but my logic wasn't right, of course.
import csv
myfile = open("C:\Users\Eduardo\Documents\TEST2.csv", "rb")
myfile1 = open("C:\Users\Eduardo\Documents\TEST2.csv", "rb")
dest = csv.writer(open("C:\Users\Eduardo\Documents\TESTFIXED.csv", "wb"), dialect="excel")
reader = csv.reader(myfile)
verum = list(reader)
verum.sort(key=lambda x: x[2])
for i, row in enumerate(verum):
if row[2] == verum[i][2]:
verum[i][3] = row[0]
print verum
Your direction and help would be much appreciated.