ansaurus

Question

What's the best way to read a huge CSV file using Perl?

Answer 1

+4 A:

Use Text::CSV

Maxwell Troy Milton King 2010-02-01 00:48:37

for a really big file like this you should be using Text::CSV_XS

singingfish 2010-02-01 01:00:37

it will use ::_XS if it is present on your system.

Evan Carroll 2010-02-01 01:42:47

in other words: XS modules typically provide better memory and/or CPU performance than pure Perl modules, which would be helpful with large files such as the one you described. See http://en.wikipedia.org/wiki/XS_%28Perl%29

molecules 2010-02-01 19:04:07

Answer 2

+2 A:

use a module like Text::CSV, however, if you know that your data will not have embedded commas and its simple CSV format, then a simple while loop to iterate the file will suffice

while (<>){
  chomp;
  @s = split /,/;
  if ( $s[2] eq "XXXX" && $s[3] eq "999" ){
    # do something;
  } 
}

ghostdog74 2010-02-01 00:55:59

Answer 3

+12 A:

Here's a solution:

#!/usr/bin/env perl
use warnings;
use strict;
use Text::CSV_XS;
use autodie;
my $csv = Text::CSV_XS->new();
open my $FH, "<", "file.txt";
while (<$FH>) {
    $csv->parse($_);
    my @fields = $csv->fields;
    next unless $fields[1] =~ /something I want/;
    # do the stuff to the fields you want here
}

singingfish 2010-02-01 00:57:59

Missing part of your dereference operator in the call to parse. And, the regex is malformed, but other than that great example.

Evan Carroll 2010-02-01 01:04:41

Answer 4

+2 A:

The Text::CSV module is a great solution for this. Another option is the DBD::CSV module, which provides a slightly different interface. The DBI interface is really useful if you're developing applications that have to access data from different forms of databases, including relational databases and comma-separated text files.

Here's some example code:

#!/usr/bin/perl

use strict;
use warnings;
use DBI;

$dbh = DBI->connect ("DBI:CSV:f_dir=/home/joe/csvdb") 
    or die "Cannot connect: $DBI::errstr";

$sth = $dbh->prepare ("SELECT id, name FROM info.txt WHERE id > 1 ORDER by id");
$std->execute;

my ($id,$name);
$sth->bind_columns (\$id, \$name);
while ($sth->fetch) {
    print "Found result row: id = $id, name = $name\n";
}
$sth->finish;

I'd use Text::CSV for this task unless you're planning on talking to other types of databases, but in Perl TIMTOWDI and it helps to know your options.

James Thompson 2010-02-01 05:34:21

Answer 5

+5 A:

Your a) question has been answered a few times over already, but b) has not yet been addressed:

I won't need all records, I mean, there are some conditionals that we can use, for example, if the 3rd CSV column content has 'XXXX' and 4th column has '999'. Can I use these conditionals to improve the read process?

No. How would you know whether the 3rd CSV column contains 'XXXX' or the 4th is '999' without reading the line first? (DBD::CSV lets you hide this behind an SQL WHILE clause, but, because CSV is unindexed data, it still needs to read in every line to determine which match the condition(s) and which don't.)

Pretty much the only way the content of a line could be used to let you skip reading parts of the file is if it contained information telling you 1) "skip the section following this line" and 2) "continue reading at byte offset nnn".

Dave Sherohman 2010-02-01 11:14:27

Yeah, that's true. Thanks.

André Diniz 2010-02-01 14:48:38

ansaurus

tags:

views:

answers:

What's the best way to read a huge CSV file using Perl?

related questions