tags:

views:

3442

answers:

4

I need to generate a file for Excel, some of the values in this file contain multiple lines.

there's also non-English text in there, so the file has to be Unicode.

The file I'm generating now looks like this: (in UTF8, with non English text mixed in and with a lot of lines)

Header1,Header2,Header3
Value1,Value2,"Value3 Line1
Value3 Line2"

Note the multi-line value is enclosed in double quotes, with a normal everyday newline in it.

According to what I found on the web this supposed to work, but it doesn't, at least not win Excel 2007 and UTF8 files, Excel treats the 3rd line as the second row of data not as the second line of the first data row.

This has to run on my customer's machines and I have no control over their version of Excel, so I need a solution that will work with Excel 2000 and later.

Thanks

EDIT: I "solved" my problem by having two CSV options, one for Excel (Unicode, tab separated, no newlines in fields) and one for the rest of the world (UTF8, standard CSV).

Not what I was looking for but at least it works (so far)

A: 

On a PC, ASCII character #10 is what you want to place a newline within a value.

Once you get it into Excel, however, you need to make sure word wrap is turned on for the multi-line cells or the newline will appear as a square box.

DanM
+2  A: 

You should have space characters at the start of fields ONLY where the space characters are part of the data. Excel will not strip off leading spaces. You will get unwanted spaces in your headings and data fields. Worse, the " that should be "protecting" that line-break in the third column will be ignored because it is not at the start of the field.

If you have non-ASCII characters (encoded in UTF-8) in the file, you should have a UTF-8 BOM (3 bytes, hex EF BB BF) at the start of the file. Otherwise Excel will interpret the data according to your locale's default encoding (e.g. cp1252) instead of utf-8, and your non-ASCII characters will be trashed.

[Following comments apply to Excel 2003 and 2007; not tested on Excel 2000]

If you open the file by double-clicking on its name in Windows Explorer, everything works OK.

If you open it from within Excel, the results vary:

(1) you have only ASCII characters in the file (and no BOM): works.

(2) you have non-ASCII characters (encoded in UTF-8) in the file, with a UTF-8 BOM at the start: it recognises that your data is encoded in UTF-8 but it ignores the csv extension and drops you into the Text Import not-a-Wizard, unfortunately with the result that you get the line-break problem.

Options include:

(a) train the users not to open the files from within Excel :-(

(b) consider writing an XLS file directly ... there are packages/libraries available for doing that in Python/Perl/PHP/.NET/etc

John Machin
Thanks, I fixed the leading spaces issue in the question, I typed the CSV example manually and didn't copy-paste from a real file, the real file doesn't include those spaces, good catch.
Nir
@Nir: Now let's talk about your real problem. So that means you had a UTF-8 BOM, and opened the file from within Excel and got the Text Import Wizard not recognising that your Value3 newline should be "protected" -- correct? Or perhaps you didn't have a UTF-8 BOM and you had to tell the TIW that your data was UTF-8 encoded and it still bungled the newline?
John Machin
A: 

I'm generating a comma delimited file from a Java application and I use "\n" as the lineSeparator

A: 

The way we do it (we use VB.Net) is to enclose the text with new lines in Chr(34) which is the char representing the double quotes and replace all CR-LF characters for LF.

Sebastian