views:

7425

answers:

13

I am programmatically exporting data (using PHP 5.2) into a .csv test file.
Example data: Numéro 1 (note the accented e). The data is utf-8 (no prepended BOM)

When I open this file in MS excel is displays as Numéro 1

I am able to open this in a text editor (UltraEdit) which displays it correctly. UE reports the character is decimal 233.

How can I export text data in a .csv file so that MS excel will correctly render it, preferably without forcing the use of the import wizard, or non-default wizard settings?

A: 

This is just of a question of character encodings. It looks like you're exporting your data as UTF-8: é in UTF-8 is the two-byte sequence 0xC3 0xA9, which when interpreted in Windows-1252 is é. When you import your data into Excel, make sure to tell it that the character encoding you're using is UTF-8.

Adam Rosenfield
I've confirmed that the data is UTF-8. What do I put into the file to let excel know that my data is utf-8 (BOM?)
Freddo411
I think that you need to change the file encoding, excel uses the system default codepage to handle csv files
AlbertEin
A BOM might do the trick, yep.
I'm not entirely sure, since I don't have Excel installed on the machine I'm currently using, but with OpenOffice, there's a dropdown box for character encoding when you import a CSV file. From there, choose Unicode (UTF-8).
Adam Rosenfield
Excel doesn't have the dropdown AFAIK
AlbertEin
A: 

Check the encoding in which you are generating the file, to make excel display the file correctly you must use the system default codepage.

Wich language are you using? if it's .Net you only need to use Encoding.Default while generating the file.

AlbertEin
The export data is utf-8. I am writing the export file with php 5
Freddo411
Transcode the data to Windows-1252 codepage, i'm not sure how to acomplish it with php
AlbertEin
A: 

The CSV format is implemented as ASCII, not unicode, in Excel, thus mangling the diacritics. We experienced the same issue which is how I tracked down that the official CSV standard was defined as being ASCII-based in Excel.

Jeff Yates
Actually, CSV is not bound to a specific encoding. It's Excel that's assuming ASCII. http://en.wikipedia.org/wiki/Comma-separated_values
spoulson
That's what I said. "implemented as ASCII in Excel", "CSV defined as ASCII-based in Excel". Not sure what point you're making as you appear to be agreeing with me.
Jeff Yates
Actually you say "The CSV format is implemented as ASCI", I think that is where the confusion stems from.
RichardOD
+12  A: 

This is a known bug with Excel and opening UTF8 csv files via file association: it assumes that they are ascii. This can not be fixed by any system default codepage or language setting. A BOM (EF BB BF) will not clue in Excel - it just won't work. (This statement is made for Excel 2000, 2003, and 2007.)

Note that you can correctly open UTF8 csv files in Excel using the "Import Text" wizard, which allows you to specify the encoding of the file you're opening.

So in once sense, you already are exporting text data so that Excel can read it. However, if you want "double-click" open-by-Excel-association to work, you'll have to export your text file as UTF16 instead of UTF8. Excel does handle UTF16 just fine, presumably by recognizing the Byte Order Marker.

James Baker
Adding a BOM appears to encourage Excel to show the import wizard. Useful, but not sufficiently elegant. I'll try the utf-16 idea.
Freddo411
Took me forever to find where to specify the encoding. Save Dialog > Tools Button > Web Options > Encoding Tab. They sure are good at hiding such important things.
Triynko
+1  A: 
daniels
This is useful. I have modified the question to ask how to do this without resorting to the wizard
Freddo411
A: 

The option to use UTF-8 only applies to Excel 2008(Win).

There is no such version of Excel.
Tony Meyer
+6  A: 

Prepending a BOM (\uFEFF) worked for me (Excel 2007), in that Excel recognised the file as UTF-8. Otherwise, saving it and using the import wizard works, but is less ideal.

It still opens the text import wizard, so the difference is that you can simply double click, so still not ideal but the only known solution anyway.
haridsv
+1  A: 

As Fregal said \uFEFF is the way to go.

<%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%>
<%
Response.Clear();
Response.ContentType = "text/csv";
Response.Charset = "utf-8";
Response.AddHeader("Content-Disposition", "attachment; filename=excelTest.csv");
Response.Write("\uFEFF");
// csv text here
%>
Kristof Neirynck
A: 

I see this question was 'answered' a long time ago and is probably lost in the depths of StackOverflow. But I wanted to add that you can save an html file with the extension 'xls' and accents will work (pre 2007 at least).

Example: save this (using Save As utf8 in Notepad) as test.xls:

<html>
<meta http-equiv="Content-Type" content="text/html" charset="utf-8" />
<table>
<tr>
  <th>id</th>
  <th>name</th>
</tr>
<tr>
 <td>4</td>
 <td>Hélène</td>
</tr>
</table>
</html>
Benjol
A: 

I've also noticed that the question was "answered" some time ago but I don't understand the stories that say you can't open a utf8-encoded csv file successfully in Excel without using the text wizard.

My reproducible experience: Type Old MacDonald had a farm,ÈÌÉÍØ into Notepad, hit Enter, then Save As (using the UTF-8 option).

Using Python to show what's actually in there:

>>> open('oldmac.csv', 'rb').read()
'\xef\xbb\xbfOld MacDonald had a farm,\xc3\x88\xc3\x8c\xc3\x89\xc3\x8d\xc3\x98\r\n'
>>> ^Z

Good. Notepad has put a BOM at the front.

Now go into Windows Explorer, double click on the file name, or right click and use "Open with ...", and up pops Excel (2003) with display as expected.

John Machin
+1  A: 

Below is the PHP code I use in my project when sending Microsoft Excel to user:

  /**
   * Export an array as downladable Excel CSV
   * @param array   $header
   * @param array   $data
   * @param string  $filename
   */
  function toCSV($header, $data, $filename) {
    $sep  = "\t";
    $eol  = "\n";
    $csv  =  count($header) ? '"'. implode('"'.$sep.'"', $header).'"'.$eol : '';
    foreach($data as $line) {
      $csv .= '"'. implode('"'.$sep.'"', $line).'"'.$eol;
    }

    header('Content-Description: File Transfer');
    header('Content-Type: application/vnd.ms-excel');
    header('Content-Disposition: attachment; filename='.$filename.'.csv');
    header('Content-Transfer-Encoding: binary');
    header('Expires: 0');
    header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
    header('Pragma: public');
    header('Content-Length: '. strlen($csv));
    echo chr(255) . chr(254) . mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');
    exit;
  }
Marc Carlucci
A: 

Excel 2007 properly reads UTF-8 with BOM (EF BB BF) encoded csv.

Excel 2003 (and maybe earlier) reads UTF-16LE with BOM (FF FE), but with TABs instead of commas or semicolons.

A: 

I can only get CSV to parse properly in Excel 2007 as tab-separated little-endian UTF-16 starting with the proper byte order mark.

Manfred Stienstra