views:

118

answers:

1

Good day,

I have a local csv file with values that change daily called DailyValues.csv
I need to extract the value field of category2 and category4.
Then combine, sort and remove duplicates (if any) from the extracted values.
Then save it to a new local file NewValues.txt.

Here is an example of the DailyValues.csv file:

category,date,value  
category1,2010-05-18,value01  
category1,2010-05-18,value02  
category1,2010-05-18,value03  
category1,2010-05-18,value04  
category1,2010-05-18,value05  
category1,2010-05-18,value06  
category1,2010-05-18,value07  
category2,2010-05-18,value08  
category2,2010-05-18,value09  
category2,2010-05-18,value10  
category2,2010-05-18,value11  
category2,2010-05-18,value12  
category2,2010-05-18,value13  
category2,2010-05-18,value14  
category2,2010-05-18,value30  
category3,2010-05-18,value16  
category3,2010-05-18,value17  
category3,2010-05-18,value18  
category3,2010-05-18,value19  
category3,2010-05-18,value20  
category3,2010-05-18,value21  
category3,2010-05-18,value22  
category3,2010-05-18,value23  
category3,2010-05-18,value24  
category4,2010-05-18,value25  
category4,2010-05-18,value26  
category4,2010-05-18,value10  
category4,2010-05-18,value28  
category4,2010-05-18,value11  
category4,2010-05-18,value30  
category2,2010-05-18,value31  
category2,2010-05-18,value32  
category2,2010-05-18,value33  
category2,2010-05-18,value34  
category2,2010-05-18,value35  
category2,2010-05-18,value07

I've found some helpful parsing examples at http://www.php.net/manual/en/function.fgetcsv.php and managed to extract all the values of the value column but don't know how to restrict it to only extract the values of category2/4 then sort and clean duplicate.

The solution needs to be in php, perl or shell script.

Any help would be much appreciated.
Thank you in advance.

A: 

Here's a shell script solution.

egrep 'category4|category2' input.file | cut -d"," -f1,3 | sort -u > output.file

I used the cut command just to show you that you can extract certain columns only, since the f switch for cut chooses, which columns you want to extract.

The u switch for sort makes the output to be unique.

Edit: It's important that you use egrep and not grep, since grep uses a somewhat restricted regular expression set, and egrep has somewhat further facilities

Edit (for people who only have grep available):

grep 'category2' input.file > temp.file && grep 'category4' input.file >> temp.file && cut temp.file -d"," -f1,3 | sort -u > output.file && rm temp.file

It produces quite an overhead but still works...

dare2be
Thank you dare2be much appreciated.The `cut` portion works great alone (new to me),but when I use the full command with egrep to do the restriction it produces an empty file.
Yallaa
now that's weird. See, to check whether I copied it properly from terminal to SO, I copy-and-pasted it to terminal and it worked... Are you sure you have `egrep` installed? Check with `which egrep`
dare2be
It is installed`> which egrep /bin/egrep > ls -l /bin/egrep lrwxrwxrwx 1 root root 4 Mar 1 2008 /bin/egrep -> grep `I tried both grep and egrep and same thing no output.
Yallaa
Haha you see, `egrep` is linked to `grep`, so you actually DON'T have `egrep` and the regular expression I posted won't work in `grep`. Try the non-egrep version I just posted.
dare2be
Is egrep linked to grep or the other way around above?I actually have both actual files /bin/egrep /bin/grep.I tried your new example and it works, thanks a million.But then I went back and did the first one with egrep but used the full path with egrep like so /bin/egrep and it worked perfectly this time. So it is an issue with the env. path.
Yallaa
Both solutions work perfectly... Thank you!!
Yallaa
You're welcome. Choose my answer to tag the question as solved, if you may..
dare2be