views:

2132

answers:

4

I am trying to take a rather large CSV file and insert it into a MySQL database for referencing in a project. I would like to use the first line of the file to create the table using proper data types and not varchar for each column. The ultimate goal is to automate this process as I have several similar files but the each has different data and a different amount of "columns" in CSV files. The problem that I am having is gettype() is returning 'string' for each column instead of int, float and string as I would like it to.

Platform is PHP 5, OS is ubuntu 8.04

here is my code so far:

<?php

// GENERATE TABLE FROM FIRST LINE OF CSV FILE

$inputFile = 'file.csv';
$tableName = 'file_csv';

$fh = fopen($inputFile, 'r');
    $contents = fread($fh, 5120); // 5KB
fclose($fh);

$fileLines = explode("\n", $contents); // explode to make sure we are only using the first line.

$fieldList = explode(',', $fileLines[0]); // separate columns, put into array
echo 'CREATE TABLE IF NOT EXISTS `'.$tableName.'` ('."<br/>\n";
for($i = 0; $i <= count($fieldList); $i++)
{
    switch(gettype($fieldList[$i])) {
        case 'integer':
            $typeInfo = 'int(11)';
            break;
        case 'float':
            $typeInfo = 'float';
            break;
        case 'string':
            $typeInfo = 'varchar(80)';
            break;
        default:
            $typeInfo = 'varchar(80)';
            break;
    }
if(gettype($fieldList[$i]) != NULL) echo "\t".'`'.$i.'` '.$typeInfo.' NOT NULL, --'.gettype($fieldList[$i]).' '.$fieldList[$i]."<br/>\n";
}
echo '  PRIMARY KEY  (`0`)'."<br/>\n";
echo ') ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;';

Example First line: 1,0,0,0,0,0,0,0,0,0,0,0,0.000000,0.000000,0,0,0,,0,0,1,0,50,'Word of Recall (OLD)',

+1  A: 

Try casting the value and comparing it with the original one:

define('DECIMAL_SEPARATOR', '.');

switch ($fieldList[$i])
{
    case (string)(int)$fieldList[$i]:
        $typeInfo = (strpos($fieldList[$i], DECIMAL_SEPARATOR) === false) ? 'int(11)' : 'float';
        break;
    case (string)(float)$fieldList[$i]:
        $typeInfo = 'float';
        break;
    default:
        $typeInfo = 'varchar(80)';
        break;
}

Additionaly, check for the presence of decimal separator in the first case for numbers that are round, yet they have the decimal fraction part.

Nouveau
Just tried this suggestion and it returned $typeInfo = 'int(11)' for each item in the list.
Jayrox
Well, that's why I wrote the "Additionally.." part at the end. Anyway, check the code, I put that check in there. You still have to decide what you want to do with empty strings, though.
Nouveau
OK, well, apparrently you have to cast those things back to string to compare them, otherwise the original string gets casted to number upon comparing.
Nouveau
I'm starting to think I may need to go a different route with selecting the type. such as: if(preg_match('/[0-9]/', $fieldList[$i])) $typeInfo = 'int(11)'; if(preg_match('/[\.]/', $fieldList[$i])) $typeInfo = 'float'; if(preg_match('/[a-z]/i', $fieldList[$i])) $typeInfo = 'varchar(80)';
Jayrox
+4  A: 

Building on Nouveau's code you could do this

for($i = 0; $i <= count($fieldList); $i++)
{
    if (is_numeric($fieldList[$i]))
    {
     if (strpos($fieldList[$i],'.') > 0){
      $fieldList[$i] = (int)$fieldList[$i];
     }else{
      $fieldList[$i] = (float)$fieldList[$i];
     }
    }

    switch(gettype($fieldList[$i])) {
        case 'integer':
            $typeInfo = 'int(11)';
            break;
        case 'float':
        case 'double':
            $typeInfo = 'float';
            break;

        case 'string':
            $typeInfo = 'varchar(80)';
            break;
        default:
            $typeInfo = 'varchar(80)';
            break;
    }
if(gettype($fieldList[$i]) != NULL) echo "\t".'`'.$i.'` '.$typeInfo.' NOT NULL, --'.gettype($fieldList[$i]).' '.$fieldList[$i]."<br/>\n";

}

That works, note the addition of "case 'double':" in the switch, but there may be a better way to do the int/float check as they would only work with standard uk/us numbers.

Re0sless
+1  A: 

Using regular expressions will give you a flexible solution. If you need to detect date fields then this is the way to go.

Liam
+1  A: 
<?php

// GENERATE TABLE FROM FIRST LINE OF CSV FILE

$inputFile = 'file.csv';
$tableName = 'file_csv';

$fh = fopen($inputFile, 'r');
    $contents = fread($fh, 5120); // 5KB
fclose($fh);

$fileLines = explode("\n", $contents);

$fieldList = explode(',', $fileLines[0]);
echo 'CREATE TABLE IF NOT EXISTS `'.$tableName.'` ('."<br/>\n";
for($i = 0; $i <= count($fieldList); $i++)
{

 if(strlen($fieldList[$i]) == 0) $typeInfo = 'varchar(80)';
 if(preg_match('/[0-9]/', $fieldList[$i])) $typeInfo = 'int(11)';
 if(preg_match('/[\.]/', $fieldList[$i])) $typeInfo = 'float';
 if(preg_match('/[a-z\\\']/i', $fieldList[$i])) $typeInfo = 'varchar(80)';

 echo "\t".'`'.$i.'` '.$typeInfo.' NOT NULL, -- '.gettype($fieldList[$i]).' '.$fieldList[$i]."<br/>\n";
}
echo '  PRIMARY KEY  (`0`)'."<br/>\n";
echo ') ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;';
Jayrox