tags:

views:

50

answers:

3

Hi

Is there any script that would transform a tab delimited data file into libSVM data format? For an example my unlabelled data:

-1 9.45 1.44 8.90 
-1 8.12 7.11 8.90
-1 8.11 6.12 8.78

and I would like to append each value with a label:

-1 1:9.45 2:1.44 3:8.90 
-1 1:8.12 2:7.11 3:8.90
-1 1:8.11 2:6.12 3:8.78

I believe this can be done using sed or awk but I just don't have a clue how to do it.

Thanks!

A: 

You could use Ruby:

labels = File.open('labels.txt','r').map{|line| line.split}.flatten
data = File.open('data.txt','r').map{|line| line.split}.flatten.drop(1)
puts labels.zip(data).map{|pair| pair.join(':')}.join(' ')
perimosocordiae
+1  A: 

Give this a try:

awk '{out=$1; for (i=2; i<=NF; i++) {out=out"\t"i-1":"$i} {print out}}' inputfile
Dennis Williamson
It worked like a charm... just like what I wanted!! :) going to put this in a shell script for batch labelling... thanks a lot!
Faiz
Actually ghostdog74's is a bit better.
Dennis Williamson
+2  A: 
$ awk -F'\t' '{for(i=2;i<=NF;i++){$i=i-1":"$i;} }1' OFS='\t' file
-1 1:9.45 2:1.44 3:8.90
-1 1:8.12 2:7.11 3:8.90
-1 1:8.11 2:6.12 3:8.78
ghostdog74
You can leave off the `-F'\t'`
Dennis Williamson