tags:

views:

187

answers:

4

I have some data (separated by semicolon) with close to 240 rows in a text file temp1. temp2.txt stores 204 rows of data (separated by semicolon).

I want to:

  1. Sort the data in both files by field1, i.e. the first data field in every row.
  2. Compare the data in both files and redirect the rows that are not equal in separate files.

Sample data:

temp1.txt
1000xyz400100xyzA00680xyz0;19722.83;19565.7;157.13;11;2.74;11.00
1000xyz400100xyzA00682xyz0;7210.68;4111.53;3099.15;216.95;1.21;216.94
1000xyz430200xyzA00651xyz0;146.70;0.00;0.00;0.00;0.00;0.00

temp2.txt
1000xyz400100xyzA00680xyz0;19722.83;19565.7;157.13;11;2.74;11.00
1000xyz400100xyzA00682xyz0;7210.68;4111.53;3099.15;216.95;1.21;216.94

The sort command I'm using:

sort -k1,1 temp1 -o temp1.tmp
sort -k1,1 temp2 -o temp2.tmp

I'd appreciate if someone could show me how to redirect only the missing/mis-matching rows into two separate files for analysis.

+3  A: 

Try

cat temp1 temp2 | sort -k1,1 -o tmp
# mis-matching/missing rows:
uniq -u tmp
# matching rows:
uniq -d tmp
abbot
You can pipe into uniq and eliminate the tmp file, also useless use of `cat` - sort can take multiple files: `sort temp1.txt temp2.txt | uniq -u`
Dennis Williamson
temporary file was there just to use the sorted data twice, for `uniq -u` and `uniq -d`, and `cat` is not required, yes.
abbot
A: 

using gawk, and outputting lines in file1 that is not in file2

awk -F";" 'FNR==NR{  a[$1]=$0;next }
( ! ( $1 in a)  ) {  print $0 > "afile.txt" }' file2 file1

interchange the order of file2 and file to output line in file2 that is not in file1

ghostdog74
Neat solution to get the missing records. thanks
novice
+1  A: 

Look at the comm command.

rjp
+1  A: 

You want the difference as described at http://www.pixelbeat.org/cmdline.html#sets

sort -t';' -k1,1 temp1 temp1 temp2 | uniq -u > only_in_temp2
sort -t';' -k1,1 temp1 temp2 temp2 | uniq -u > only_in_temp1

Notes:

  • Use join rather than uniq, as shown at the link above if you want to compare only particular fields
  • If the first field is fixed width then you don't need the -t';' -k1,1 params above
pixelbeat
If you're sorting by field 1, do you need to specify the field at all?
Dennis Williamson
Yes, as I commented you need to specify the field boundaries iff the fields are not fixed width
pixelbeat
Excellent solution. Thanks to everyone who jotted down their inputs.
novice