Hi I have a ton of data in multiple csv files and filter out a data set using grep:
user@machine:~/$ cat data.csv | grep -a "63[789]\...;"
637.05;1450.2
637.32;1448.7
637.60;1447.7
637.87;1451.5
638.14;1454.2
638.41;1448.6
638.69;1445.8
638.96;1440.0
639.23;1431.9
639.50;1428.8
639.77;1427.3
I want to figure out the data set which has the highest count, the column right of the ; and then know the corresponding value (left of the ;). In this case the set I'm looking for would be 638.14;1454.2
I tried different things and ended up using a combination of bash and python, which works, but isn't very pretty:
os.system('ls | grep csv > filelist')
files = open("filelist")
files = files.read()
files = files.split("\n")
for filename in files[0:-1]:
os.system('cat ' + filename + ' | grep -a "63[6789]\...;" > filtered.csv')
filtered = csv.reader(open('filtered.csv'), delimiter=';')
sortedlist = sorted(filtered_file, key=operator.itemgetter(1), reverse=True)
dataset = sortedlist[0][0] + ';' + sortedlist[0][1] + '\n'
I would love to have a bash only solution (cut, awk, arrays?!?) but couldn't figure it out. Also I don't like the work around writing the bash commands into files and then reading them into python variables. Can I read them into variables directly or are there better solutions to this problem? (probably perl etc... but I am really interested in a bash solution..)
Thank you very much!!