Hi,
I have a set of 1000 text files with names in_s1.txt
, in_s2.txt
and so. Each file contains millions of rows and each row has 7 columns like:
ccc245 1 4 5 5 3 -12.3
For me the most important is the values from the first and seventh columns; the pairs ccc245 , -12.3
What I need to do is to find between all the in_sXXXX.txt
files, the 10 cases with the lowest values of the seventh column value, and I also need to get where each value is located, in which file. I need something like:
FILE 1st_col 7th_col
in_s540.txt ccc3456 -9000.5
in_s520.txt ccc488 -723.4
in_s12.txt ccc34 -123.5
in_s344.txt ccc56 -45.6
I was thinking about using python and bash for this purpose but at the moment I did not find a practical approach. All what I know to do is:
- concatenate all
in_
files inIN.TXT
- search the lowest values there using:
for i in IN.TXT ; do sort -k6n $i | head -n 10; done
- given the 1st_col and 7th_col values of the top ten list, use them to filter the in_s files, using
grep -n VALUE in_s*
, so I get for each value the name of the file
It works but it is a bit tedious. I wonder about a faster approach only using bash or python or both. Or another better language for this.
Thanks