views:

22

answers:

2

I am looking for some best practices as far as handling csv and tab delimited files.

For CSV files I am already doing some formatting if a value contains a comma or double quote but what if the value contains a new line character? Should I leave the new line intact and encase the value in double quotes + escape any double quotes within the value?

Same question for tab delimited files. I assume the answer would be very similar if not the same.

A: 

Usually you keep \n unaltered while exploiting the fact that the newline char will be enclosed in a " " string. This doesn't create ambiguities but it's really ugly if you have to take a look to the file using a normal texteditor.

But it is how you should do since you don't escape anything inside a string in a CSV except for the double quote itself.

Jack
thanks for the quick answer! that is what I was thinking I just wanted to check with someone else to make sure
rushonerok
A: 

@Jack is right, that your best bet is to keep the \n unaltered, since you'll expect it inside of double-quotes if that is the case.

As with most things, I think consistency here is key. As far as I know, your values only need to be double-quoted if they span multiple lines, contain commas, or contain double-quotes. In some implementations I've seen, all values are escaped and double-quoted, since it makes the parsing algorithm simpler (there's never a question of escaping and double-quoting, and the reverse on reading the CSV).

This isn't the most space-optimized solution, but makes reading and writing the file a trivial affair, for both your own library and others that may consume it in the future.

Robert Hui