I am looping through a big data file and like to detect the type of variable in each column, eg if it is an Intenger or a Float etc. It works perfectly fine, however, at the moment it is still very basic and I like to add another idea. So far the declaration of the variable is based on the second row of the data set. (The first one is used as the header.) Here is the beginning of the code:
#!/usr/bin/perl
use warnings;
use diagnostics;
use Getopt::Std;
getopts("i:s:t:") or die "bad options: $!";
if($opt_i) {
open INFILE, "< $opt_i";
chomp($headerline = <INFILE>);
$second = <INFILE>;
} else {
die "the input file has to be given\n";
}
if($opt_t) {
$tablename = $opt_t;
} else {
$tablename = $opt_i;
$tablename =~ s/\.\w+//;
}
if($opt_s) {
$sep = $opt_s;
} else {
$sep = ",";
}
$headerline =~ s/\"//g;
$headerline =~ s/\./\_/g;
@header = split/$sep/, $headerline;
$second =~ s/\"//g ;
@second = split/$sep/, $second;
@terms = split/$sep/, $second;
@types = split/$sep/, $second;
And now I have implemented a small loop.
The problem is that I don't know how to handle the missing values which are declared with NULL. At the moment the loop simply assigns "" i.e. nothing to the variable $vartype[$j]
.
$j = 0;
while($j <= $#second) {
if ($types[$j] =~ /NULL/) {
$vartype[$j] = "";
} elsif($types[$j] =~ /[A-Za-z]/) {
$vartype[$j] = "varchar";
} elsif ($types[$j] =~ /\./) {
$vartype[$j] = "double";
} else {
$vartype[$j] = "int";
}
$j++;
}
So how can I implement another loop structure into the existing loop so that whenever I have a NULL value in one column the loop reads the next value in that same column and does so until he finds a number or a word.
A sample of my data would be eg:
Country.Name Time.Name AG.LND.AGRI.ZS NY.GDP.MKTP.CD NE.IMP.GNFS.ZS
Brunei Darussalam 1960 NULL 1139121335.16 3.46
Brunei Darussalam 1960 NULL 1677595756.64 0.9
Brunei Darussalam 1960 NULL 1488339328.59 4.19
Brunei Darussalam 1961 3.98 1869828587.8 3.14
Brunei Darussalam 1961 3.98 2346769422.22 3.38
Brunei Darussalam 1961 3.98 2363109706.3 3.17
As already mentioned the for loop only uses the second row to decide on the type of the variable.
Now I would like to implement another loop so that eg in the third column (AG.LND.AGRI.ZS) he goes through the column until he detects the first real value, in this case 3.98. At the moment the loop recognizes the missing value marked with NULL and just assigns an empty value.