views:

31

answers:

2

I have a CSV file containing some user data it looks like this:

"10333","","an.10","Kenyata","","Aaron","","","","","","","","","",""
"12222","","an.4","Wendy","","Aaron","","","","","","","","","",""
"14343","","aaron.5","Nanci","","Aaron","","","","","","","","","",""

I also have a file which has an item on each line like this:

an.10
arron.5

What I want is to find only the lines in the CSV file contained in the list file.

So desired output would be:

"10333","","an.10","Kenyata","","Aaron","","","","","","","","","",""
"14343","","aaron.5","Nanci","","Aaron","","","","","","","","","",""

(Note how an.4 is not contained in this new list.)

I have any environment available to me and am willing to try just about anything aside from manually doing so as this csv contains millions of records and there are about 100k entries in the list itself.

+1  A: 

How unique are the identifiers an.10 and the like?

Maybe a very small *x shell script would be enough:

for i in $(uniq list.txt); do grep "\"$i\"" data.csv; done

That would, for every unique entry in the list, return all matching lines in the csv file. It does not match exclusively on the second column however. (That could be done with awk for example)

relet
They are unique as can be. :-)
Chris
What a scary coincidence in the choice of file names! But your code will not work, $i will have only one value "list.txt".
Joy Dutta
Indeed. I stand corrected. :)
relet
This does not output the entire line from the csv file it only outputs the matched string and I am also receiving duplicate entries.
Chris
Another cause of duplicates could be a user an.10 matching a line like joan.100 - I have included the double quotes in the search term.
relet
+1  A: 

If the csv file is data.csv and the list file is list.txt, I would do this:

for i in `cat list.txt`; do grep $i data.csv; done
Joy Dutta
im ending up with duplicates though ?
Chris
Do you have duplicates in your list then? If you want a quick fix to remove these, pipe either your list or the result through `|uniq`
relet