ansaurus

Question

How to get the difference between two list based on substrings withing each string in the seperate lists.

Answer 1

+1 A:

line.split() splits at whitespace. Use line.split(',') instead.

Also: Does the order of the lines matter? If not, then you should really use a set() instead of a list. That will make the code much faster.

Aaron Digulla 2010-01-05 17:23:04

*facepalm* Can't believe I missed that!

Chance 2010-01-05 17:37:14

Now my code works, mere hours after I first said "I'll just write a quick script" Thanks for saving me from myself!

Chance 2010-01-05 17:48:18

Answer 2

+1 A:

You could create the set of emails as you do and then:

# emails is a set of emails
for line in fileinput.input("csvfile.csv",inplace =1):
    parts = line.split(',')
    if parts[3] not in emails:
        print line

This only works, if the email in the CSV file is always at position 4.

fileinput enables in place editing.

And use a set for the emails instead of a list as Aaron said, not only because of speed but also to eliminate duplicates.

Felix Kling 2010-01-05 17:31:37

perfect, although my problem was actually a typo pointed out by Aaron Digulla, this answers the question I asked in a very clear way, and taught me something.

Chance 2010-01-05 17:45:45

Answer 3

A:

here's another way, with minimalistic check on email addr's position.

import fileinput
emails=[]
for line in open("file1"):
    start=line.find("<")
    end=line.find(">")
    if start != -1 and end !=-1:
        emails.append(line[start+1:end])

for line in fileinput.FileInput("file2",inplace=1):
    p = line.split(",")
    for item in p:
        if "@" in item and item not in emails:
            print line.strip()

output

$ ./python.py
156464,bob,otherguy,[email protected],45644562

ghostdog74 2010-01-06 00:39:23

ansaurus

tags:

views:

answers:

How to get the difference between two list based on substrings withing each string in the seperate lists.

related questions