views:

4561

answers:

9

Hi Guys,

I am working on a Perl script to read CSV file and do some calculations. CSV file has only two columns, something like below.

One Two
1.00 44.000
3.00 55.000

Now this CSV file is very big ,can be from 10 MB to 2GB.

Currently I am taking CSV file of size 700 MB. I tried to open this file in notepad, excel but it looks like no software is going to open it.

I want to read may be last 1000 lines from CSV file and see the values. How can I do that? I cannot open file in notepad or any other program.

If I write a Perl script then I need to process complete file to go to end of file and then read last 1000 lines.

Is there any better way to that? I am new to Perl and any suggestions will be appreciated.

I have searched net and there are some scripts available like File::Tail but I don't know they will work on windows ?

+4  A: 

In *nix, you can use the tail command.

tail -1000 yourfile | perl ...

That will write only the last 1000 lines to the perl program.

On Windows, there are gnuwin32 and unxutils packages both have tail utility.

S.Lott
Thanks for adding the windows versions of tail.
S.Lott
+1  A: 

You could use Tie::File module I believe. It looks like this loads the lines into an array, then you could get the size of the array and process arrayS-ze-1000 up to arraySize-1.

Tie::File

Another Option would be to count the number of lines in the file, then loop through the file once, and start reading in values at numberofLines-1000

$count = `wc -l < $file`;
die "wc failed: $?" if $?;
chomp($count);

That would give you number of lines (on most systems.

Adam Lerman
A: 

If you know the number of lines in the file, you can do

perl -ne "print if ($. > N);" filename.csv

where N is $num_lines_in_file - $num_lines_to_print. You can count the lines with

perl -e "while (<>) {} print $.;" filename.csv
David Nehme
+2  A: 
perl -n -e "shift @d if (@d >= 1000); push(@d, $_); END { print @d }" < bigfile.csv

Although really, the fact that UNIX systems can simply tail -n 1000 should convince you to simply install cygwin or colinux

geocar
Thanks ...I used tail and it worked perfectly....
Alien01
A: 

Without tail, a Perl-only solution isn't that unreasonable.

One way is to seek from the end of the file, then read lines from it. If you don't have enough lines, seek even further from the end and try again.

sub last_x_lines {
    my ($filename, $lineswanted) = @_;
    my ($line, $filesize, $seekpos, $numread, @lines);

    open F, $filename or die "Can't read $filename: $!\n";

    $filesize = -s $filename;
    $seekpos = 50 * $lineswanted;
    $numread = 0;

    while ($numread < $lineswanted) {
        @lines = ();
        $numread = 0;
        seek(F, $filesize - $seekpos, 0);
        <F> if $seekpos < $filesize; # Discard probably fragmentary line
        while (defined($line = <F>)) {
            push @lines, $line;
            shift @lines if ++$numread > $lineswanted;
        }
        if ($numread < $lineswanted) {
            # We didn't get enough lines. Double the amount of space to read from next time.
            if ($seekpos >= $filesize) {
                die "There aren't even $lineswanted lines in $filename - I got $numread\n";
            }
            $seekpos *= 2;
            $seekpos = $filesize if $seekpos >= $filesize;
        }
    }
    close F;
    return @lines;
}

P.S. A better title would be something like "Reading lines from the end of a large file in Perl".

Joshua Swink
P.P.S. Adding a comment to explain a downvote would be appreciated. If I felt the answer wasn't helpful/responsive, I'd delete it.
Joshua Swink
+13  A: 

The File::ReadBackwards module allows you to read a file in reverse order. This makes it easy to get the last N lines as long as you aren't order dependent. If you are and the needed data is small enough (which it should be in your case) you could read the last 1000 lines into an array and then reverse it.

Michael Carman
Second the recommendation. You can brew your own seek/read stuff, but there's no point when it's already done for you in a widely used, well tested CPAN module.
ysth
A: 

Without relying on tail, which I probably would do, if you have more than $FILESIZE [2GB?] of memory then I'd just be lazy and do:

my @lines = <>;
my @lastKlines = @lines[-1000,-1];

Though the other answers involving tail or seek() are pretty much the way to go on this.

dlamblin
Okay fine, use tail, like you didn't know. You asked about perl and this works in perl. If there's any reason it should be considered an inappropriate answer I'd appreciate a comment.
dlamblin
+5  A: 
brian d foy
A: 

You should absolutely use File::Tail, or better yet another module. It's not a script, it's a module (programming library). It likely works on Windows. As somebody said, you can check this on CPAN Testers, or often just by reading the module documentation or just trying it.

You selected usage of the tail utility as your preferred answer, but that's likely to be more of a headache on Windows than File::Tail.

skiphoppy