tags:

views:

102

answers:

3

I have two text files which need to have the same values.

$ diff A.txt B.txt
4a5
> I have this extra line.
$

Open files in Perl

open (ONE, "<A.txt");
open (TWO, "<B.txt");

How can I do such a diff from within Perl? Does Perl have a inbuilt diff or do I need to use the unix diff utility? I don't want to implement my own diff algorithm for this.

I do need the information as to where my files differ, but I do not need to use the unix diff utility necessarily. That was just an example.

+8  A: 

You could try using Text::Diff

Alternatively, the UNIX utility could be an option.

cubic1271
+4  A: 

If I only needed to know that they were the same (i.e. not discover how they are different), I'd just use Digest::MD5 to see if they come up with the same digest. There's a vanishingly small chance that two different files could have the same MD5 digest, so you might even try Digest::SHA1.

If you want to find out which lines are different, then you can use Algorithm::Diff, perhaps in conjunction with Tie::File. However, there is also a diff program that comes with Algorithm::Diff if you don't have a diff tool on your target platform. Although you can shell out to that, you might just want to copy what it does into a subroutine. Text::Diff is built on top of Algorithm::Diff, so it might already do want you want.

brian d foy
@brian: I need to know which all lines `diff` ed.
Lazer
Well, then you should say that. In your comment to Ira you said otherwise.
brian d foy
There's zero chance of a collision if you read both files and compare each byte. Hashing is great for some things, such as comparing files over the network (without sending the whole file), or comparing thousands of files against each other. For comparing two files that are located on the same machine (at the same point in time), what is the advantage of hashing?
bk1e
Well, you might save a lot of time. Let's say you have 1000 files. That's a lot of two-file combinations to check. Never mind that these files might be large. You can generate the hash just once for each file and do a simple comparison.
brian d foy
+1  A: 

No, Perl doesn't have an inbuilt "diff" facility. Either you use an external module, or use Perl's data structures(hashes, arrays etc) or you create filehandles for both files, and iterate the files using the filehandle (while loops), comparing them line by line. This method assumes your files are sorted. Another not so elegant way is to call "diff" from Perl, but I advise against that.

Lastly, if Perl is not a must, just use the Unix diff utility (write a shell script).

ghostdog74