tags:

views:

673

answers:

3

I have a simple script that accepts a CSV file and reads every row into an array. I then cycle through each column of the first row (in my case it holds the questions of a survey) and I print them out. The survey is in french and whenever the first character of a question is a special character (é,ê,ç, etc) fgetcsv simply omits it.

Special characters in the middle of the value are not affected only when they are the first character.

I tried to debug this but I am baffled. I did a var_dump with the content of the file and the characters are definitely there:

var_dump(utf8_encode(file_get_contents($_FILES['csv_file']['tmp_name'])));

And here's my code:

if(file_exists($_FILES['csv_file']['tmp_name']) && $csv = fopen($_FILES['csv_file']['tmp_name'], "r"))
    {
        $csv_arr = array();

        //Populate an array with all the cells of the CSV file
        while(!feof($csv))
        {
            $csv_arr[] = fgetcsv($csv);
        }

        //Close the file, no longer needed
        fclose($csv);

        // This should cycle through the cells of the first row (questions)
        foreach($csv_arr[0] as $question)
        {
            echo utf8_encode($question) . "<br />";
        }

    }

Any help would be greatly appreciated as I just don't know what's going on! :)

+1  A: 

Have you already checked out the manual page on fgetcsv? There is nothing talking about that specific problem offhand, but a number of contributions maybe worth looking through if nothing comes up here.

There's this, for example:

Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.

Also, seeing as it's always in the beginning of the line, could it be that this is really a hidden line break problem? There's this:

Note: If PHP is not properly recognizing the line endings when reading files either on or created by a Macintosh computer, enabling the auto_detect_line_endings run-time configuration option may help resolve the problem.

You may also want to try saving the file with different line endings.

Pekka
I've read the manual page on how to use the function and a quick search through the comment area didn't pop up anything for special characters or utf-8 encoding.I had noticed that it could have trouble with UTF-8 encoding but if I don't encode the values the value still doesn't show up. I'm not sure if there would be another way to get around this.I tried using "|" as an end of line delimiter and I get the same problem. This is very confusing :)
Gazillion
+2  A: 

Are you setting your locale correctly before calling fgetcsv()?

setlocale(LC_ALL, 'fr_FR.UTF-8');

Otherwise, fgetcsv() is not mb-safe.

Brock Batsell
Sorry if I come off as ignorant but what is mb-safe? I added the line with no effect to the behaviour of my script.The manual says that the function is binary safe since PHP 4.3.5 (we have php 5 installed)
Gazillion
Multi Byte Safe = able to handle encodings in which a single character can consist of more than one byte (e.g. UTF-8).
Pekka
Ah thanks! I guess I'll leave it there :)
Gazillion
+1  A: 

This behaviour has a bug report filed for it, but apparently it isn't a bug.

David Johnstone