tags:

views:

168

answers:

4

Suppose File 1 has two columns and looks something like:

fuzz          n.  flowering shrub of the rhododendron family
dyspeptic     adj. bright blue, as of the sky 
dysplexi      adj. of Byzantium or the E Roman Empire
eyrie         adj. of the Czech Republic or Bohemia
azalea        adj. suffering from dyslexia
Czech         adj. suffering from dyspepsia
Byzantine     n. eagle's nest
azure         n. mass of soft light particle

File 2 has only one clumn and looks something like:

azalea
azure
Byzantine
Czech
dyslexic
dyspeptic
eyrie
fuzz

I want the first column of File 1 replaced with the column of File 2. Thus, File 3 should look like this:

azalea        n.  flowering shrub of the rhododendron family
azure         adj. bright blue, as of the sky 
Byzantine     adj. of Byzantium or the E Roman Empire
Czech         adj. of the Czech Republic or Bohemia
dyslexic      adj. suffering from dyslexia
dyspeptic     adj. suffering from dyspepsia
eyrie         n. eagle's nest
fuzz          n. mass of soft light particle

I have a feeling that there's one or another simple way of doing this kind of job and it is very likely that ther're some handy modules out there but for now I simply can't do it even in the most inefficient way. I tried a bunch of code like

while<$line1 = file1>{
while<$line2 = file2>{
join $line,$line2

but no luck at all. Can someone kindly point me in the right direction? Thanks, as always, for any guidance.

+4  A: 

I read this as you want to output the first file sorted like the second. After rereading it It seems you just want to replace the column, without changing order. Here's the solution to that assuming you can handle opening the files.

while(($line1 = <FILE1>) && ($line2 =  <FILE2>)){
  chomp $line2;
  $line1 =~ s/^\w+/$line2/;
  print FILE3 $line1;
}

Here's my original solution, to sort the entries in the order they appear in the second file.

Create a hash of file 1.

$dictionary = {}
while (<FILE1>){
  m/^(\w+)\s+(.*)$/;
  $dictionary{$1}=$2;
}

Look up the definition for each key in file 2 and print a joined line

while (<FILE2>){     
  $key =~ s/\s*//g;
  print FILE3 "$key\t\t$dictionary{$key}\n";
}
EmFi
Where is this `strip()` function?
Chris Lutz
It was on the next line. Forgot for a moment that Perl doesn't have a strip.
EmFi
Mike
@EmFi, Perl also reports that this line "$line2 ~= s/^\w+/$line1/;" has something wrong.
Mike
So this line "$line2 ~= s/^\w+/$line1/" will replace the beginning words with the chomped $line1? But it seems something's wrong with the syntax.
Mike
Yeah I didn't test it so there's syntax errors all over the place. I also mixed up $line1 and $line2 in my loop. Everything's fixed now.
EmFi
@EmFi, now the code works great :) Thanks for sharing!
Mike
Sorry it took a half hour to work out my errors.
EmFi
@EmFi, now I've learnt first how to read two lines at the same time, second how to replace part of one line with the content of another. Thanks for the code. I think my Perl knowledge has been improved. Thanks :)
Mike
+6  A: 

If you want to read two lines at the same time, try this:

while(defined(my $line1 = <file1>)
      and defined(my $line2 = <file2>)) {
  # replace contents in $line1 with $line2 and do something with $line1
}

This will stop working as soon as one line runs out, so it may be a good idea to see if both files are empty at the end of this loop:

die "Files are different sizes!\n" unless eof(file1) == eof(file2);

Of course, in modern Perl you can store filehandles in lexically scoped variables like this:

open my $fh, ...

And then replace ugly global <FILEHANDLES> with nice lexically scoped <$filehandles>. It's much nicer, and it makes

Chris Lutz
@Chris, it's good to know how to read two lines at the same time. Why the book Learning Perl doesn't say anything about this line of code?
Mike
What is there to say? Could you really not infer this?
jrockway
It's a tad hairy, and it's uncommon. If your files are decently small, you could just as easily slurp both of them into arrays and work with them in memory. However, if this code is expected to run on older computers, or on large files, that approach will be very memory intensive, and is discouraged.
Chris Lutz
@jrockway - If you're only looping through one file with `while($line = <file>)` Perl adds the `defined()` check, so the OP may not know about it and might naively write `while($line1 = <file1> and $line2 = <file2>)` which will fail when one of them reads a blank line (or the line "0").
Chris Lutz
+4  A: 

Think about what you want to do, in small steps.

  • read in one line from each file.
  • File 1 has two columns, so split it into two columns.
  • now you have one line from File 1 (in two parts), and a line from File 2.
  • print the parts you want to keep: the first part of File 1, and the part from File 2.

And then you keep doing that until you run out of lines from one file or the other.

Here's some of the pieces you need:

  • open a file: open(my $filehandle, '<', 'filename') or die "Can't open filename";
  • read a single line: my $line = <$filehandle>;
  • split it into two columns: there's lots of ways of doing this - with a regexp, or split(), or even substr()
  • print a line out: pretty simple
  • if you run out of lines, you're done: exit if !$line;
Ether
Based on that horrendous `while`-loop-ish thing in his question, I think the OP was confused on syntax. But good stuff.
Chris Lutz
@Chris, yes, yes, my Perl grammar is terrible. I know. It often happens I simply can't use the correct syntax to implement my ideas.
Mike
@Ether, thanks for the step by step suggestions. I think I need to improve my syntax knowledge.
Mike
@Mike I highly recommend you get a hold of the llama book ("Learning Perl") - it's really good for going through the fundamentals and showing you where to look up all the details that are hard to remember.
Ether
@Ether, I am actually in the process of reading the Llama Book and I'm almost finished. But thanks the same :)
Mike
A: 

You can use " cut -c 10- file1 | paste file2 - " on *nix.

aartist