tags:

views:

59

answers:

2

I have 2 files, one is the master file and the other is a subset of this file, with some additional data. Both the files are in the form of fields with a ^A separator. My job comes in that, from the master file, I wish to create the subset files. The subset files already have some data, which I can duplicate, but I want the fields available in the master file to be also available in the subset file.

Example:

subset file format:
1234^A56^A78^A910^A1112^A13^A14^A151617^A18^A192021.000000^A22.000000

master file format:
1242^A2282^A2^A1^A0
1234^A78^A910^A4^A4
1380^A2594^A2^A25^A3
1404^A2447^A6^A44^A9

In the above example, the master file has 4 rows, while the subset file has 1 row. The values of the 2nd row of the master file match the row in the subset file.

I want to create the additional lines in the master file, into the subset file too. Basically, the first, third and fourth fields in the subset file should match the first 3 fields of the master file, the rest can be any randomly generated values.

Also, I wish to retain the ^A separators in the subset file.

A: 
  1. Use split to get the fields from the master file.
  2. Use rand($range) to generate random values.
  3. Use join to put your fields and random values together.

Does that answer your question?

Vanessa MacDougal
+2  A: 

Assuming you want to append all records to the same subset file, use

#! /usr/bin/perl -l

use warnings;
use strict;

# demo only
my $buf = join "" =>
          map "$_\n" =>
          "1242\cA2282\cA2\cA1\cA0",
          "1234\cA78\cA910\cA4\cA4",
          "1380\cA2594\cA2\cA25\cA3",
          "1404\cA2447\cA6\cA44\cA9";
open my $master, "+<", \$buf or die "$0: open: $!";

open my $subset, ">>", "subset.dat" or die "$0: open: $!";

while (<$master>) {
  chomp;
  my($id,$x,$y) = (split /\cA/)[0..2];

  print $subset join "\cA" =>
    $id, 56, $x, $y, 
    1112, 13, 14, 151617, 18, 192021.000000, 22.000000;
}

close $subset or warn "$0: close: $!";

As documented in perlop, the escape sequence \cA produces the Ctrl-A (ASCII SOH) separator you're using. To keep the demo self-contained, the code above reads $buf as though it were a file, but of course you'd open the master file in production.

Output viewed through less where again bolded ^A indicates ASCII SOH:

1242^A56^A2282^A2^A1112^A13^A14^A151617^A18^A192021^A22
1234^A56^A78^A910^A1112^A13^A14^A151617^A18^A192021^A22
1380^A56^A2594^A2^A1112^A13^A14^A151617^A18^A192021^A22
1404^A56^A2447^A6^A1112^A13^A14^A151617^A18^A192021^A22
Greg Bacon