tags:

views:

50

answers:

3

I have a file that look like this:

1 543423 34354 
2 5654656 3423 xyz_1378,xyz_1379
3 4645656 34234354 xyz_1384,xyz_1385
4 5654 78678 xyz_1390,xyz_1391,xyz_1392
5 54654 76867 xyz_1411,xyz_1412,xyz_1413
6 54654 8678 
7 56546 67867 xyz_1711
8 678 7867 
9 76867 7876 xyz_2940
10 6786 678678 xyz_3101,xyz_3102,xyz_3103,xyz_3104,xyz_3105,xyz_3106,xyz_3107
11 67867 78678 

Note it contains 4 fields, space separated. the last (fourth) field might be empty, and may contain numerous values separated by commas.

I would like to print all the values from the last row, one per line. how can I do that (preferably using awk)?

UPDATE: I need to do this in batch for many files (gets the concatenated output of all the files together).

This works:

for x in *; do awk '{print $4}' $x/filename | awk --field-separator="," '{if ($0 != "") {for (i=1; i<NF+1; i++) print $i}}'; done;

and returns something like

xyz_1378
xyz_1221
xyz_97
xyz_132523
xyz_242

The only thing I am missing now, is that I want each of the above line to begin with an extra field - $x (the one from the for loop).

I tried changing print $i to print $x,$i" butx` does not seem to be recognized correctly in this scope. Any ideas?

Thanks!

A: 

Use if($4) to see if there is anything in the field. Then split($4,a,/,/) will give you an array a with all values. Put that into a large result array:

 {
    if($4) {
        n = split($4, a, /,/);
        for( i=1; i<=n; i++ ) {
            result[a[i]] = 0;
        }
    }
}

and print it at the end:

END {
    for( val in result ) {
        print val;
    }
}

If you want that sorted, filter the output by piping through sort(1)

Aaron Digulla
I tried testing, but this prints 1 to 7, each on a line. did I make a mistake somewhere?
Adriano Varoli Piazza
-1: This seems to be printing the last digits of the last line's values, which coincidentally, go from 1 to 7.
Adriano Varoli Piazza
Sorry, I forgot that `for(x in y)` doesn't work on arrays. Fixed.
Aaron Digulla
+1  A: 

Use awk's -v option to pass the variable into the awk script instead of relying on the shell's substitution. Also, you only need one call to awk

for dir in *; do 
    awk -v "dir=$dir" '
        NF==4 {
            n = split($4, a, ",")
            for (i=1; i<=n; i++) {print dir "\t" a[i]}
        }
    ' "$dir/filename"
done

or, if you don't mind seeing "dir/filename":

awk '
    NF==4 {
        n = split($4, a, ",")
        for (i=1; i<=n; i++) {print FILENAME "\t" a[i]}
    }
' */filename

If you have huge numbers of directories, your shell may choke when expanding "*/filename", so use find and xargs:

find . -type f -name filename -print0 | xargs -0 awk '...'

(requires GNU find/xargs for the -print0/-0 options)

glenn jackman
A: 

Probably you can change one of the statements in your command to

awk '{print FILENAME "," $4}' $x

and then work on the output of this.

FILENAME is the internal awk variable for getting the filename of the file on which it is processing.

Vijay Sarathi