I have to deal with various input files with a number of fields, arbitrarily arranged, but all consistently named and labelled with a header line. These files need to be reformatted such that all the desired fields are in a particular order, with irrelevant fields stripped and missing fields accounted for. I was hoping to use AWK
to handle this, since it has done me so well when dealing with field-related dilemmata in the past.
After a bit of mucking around, I ended up with something much like the following (writing from memory, untested):
# imagine a perfectly-functional BEGIN {} block here
NR==1 {
fldname[1] = "first_name"
fldname[2] = "last_name"
fldname[3] = "middle_name"
maxflds = 3
# this is just a sample -- my real script went through forty-odd fields
for (i=1;i<=NF;i++) for (j=1;j<=maxflds;j++) if ($i == fldname[j]) fldpos[j]=i
}
NR!=1 {
for (j=1;j<=maxflds;j++) {
if (fldpos[j]) printf "%s",$fldpos[j]
printf "%s","/t"
}
print ""
}
Now this solution works fine. I run it, I get my output exactly how I want it. No complaints there. However, for anything longer than three fields or so (such as the forty-odd fields I had to work with), it's a lot of painfully redundant code which always has and always will bother me. And the thought of having to insert a field somewhere else into that mess makes me shudder.
I die a little inside each time I look at it.
I'm sure there must be a more elegant solution out there. Or, if not, perhaps there is a tool better suited for this sort of task. AWK
is awesome in it's own domain, but I fear I may be stretching it's limits some with this.
Any insight?