tags:

views:

181

answers:

4

After updating to php 5.3 one of our systems has developed an interesting bug. It parses csv files and the first step is to show the user what they have uploaded to check it over before confirming.

However we have run into a bug where some files upload, but aren't read. The weird thing is if we take the data from those files, copy and paste it into notepad and save as a .csv file it will upload fine.

My first thought was maybe something to do with people creating csv files from a specific program? I noticed that the one that doesn't work (even though it contains the same data) is a little smaller than the one we copy and paste from it.

Any help greatly appreciated.

+7  A: 

Sounds like one has Carriage Return + LineFeeds and the other has only Line feeds (or is it carraige returns?)

Either that or it has a different encoding: ASCII versus UTF-16

Mitch Wheat
Thanks, now I just have to sanitize it.
Tjkoopa
+1  A: 

Are the CSV files in the same encoding?

Perhaps some have the UTF-8 BOM at the start, or others are in something like UTF-16.

Or, there could be a difference in the line ending characters - the evil one could use just LF, and the good CR+LF.

lavinio
Macs use CR, *nix (including Mac OS X) uses LF, and Windows uses CR+LF. So in other words, you just said Linux, Unix, and Macs are evil, and Windows is good.
Coding With Style
A: 

If the files look identical, but are different sizes, it may be that the newline characters are different between the files.

You could open the files in some form of text editor other than Notepad (e.g. Notepad++) to confirm this

Patrick McDonald
A: 

Check the file encoding. Notepad is saving in ANSI charset by default. I noticed this when Excel would parse the fields in one CSV, ANSI, but not another, UTF8. In the UTF8 file, it was all one field, with commas included. Only when I saved as ANSI would Excel parse the CSV. A similar thing may be happening with PHP.

Edit: If this turns out to be the case, you could implement a procedure to "sanitize" the incoming CSV files. In effect, do in PHP just like you were copying and then pasting. It will be hard to insist on a particular encoding if you are dealing with file uploading from the client, however, you should be able to transform the incoming data.

maxwellb