WHY are you trying to do this? Is it to minimize storage? Eliminate processing costs for parsing many un-needed columns?
If the latter, you can't avoid that processing cost. Any solution you come up with would STILL read and parse 100% of the file.
If the former, there are many methods, some are more efficient than the others.
Also, what exactly do you mean "help me do a quick check of the column names?"? If you want to get the column names, there's column_names()
method provided you previously set the column names using column_names(getline($fh))
.
If you want to only return specific column names in a hash to avid wasting memory on un-needed columns, there's no clear-cut API for that. You can roll your own, or abuse a "bug/feature" of getline_hr()
method:
For the former (roll your own), you can do something like:
my $headers = $csv->getline( $fh ); # First line is headers. my @headers_keep = map { /^cpu.usage.mhz.average/ ? 1 : 0 } @$headers; while ( my $row = $csv->getline( $fh ) ) { my $i = 0; my @row_new = grep { $headers_keep[$i++] } $@row; push @rows, \@row_new; }
BUT you can either roll your own OR .
You can also use a "feature" of "
getline_hr()
" which doesn't assign values into a hash if the column name is a duplicate (only the LAST version gets assigned) \In your case, for column names:
date,mem_total,cpu.usagemhz.average_0,cpu.usagemhz.average_1,cpu.usagemhz.average_2
, merely set the column_names array to contain "cpu.usagemhz.average_0" value in the first 2 eements of the array - they will NOT be then saved bygetline_hr()
.You can go over the list of columns, find the consecutive range of "not needed" columns, and replace their names with the name of the first needed column follwing that range. The only stiking point is if the "un-needed" range is at the very end of the columns - replace with "JUNK" or something.