views:

87

answers:

5
+1  Q: 

Simple csv reader?

all,

I started out with what i thought was going to be a pretty simple task. (convert a csv to "wiki" format) but im hitting a few snags that im having trouble working through

I have 3 main problems

1) some of the cells contain \r\n ( so when reading line by line this treats each new line as a new cell

2) some of the rows contain "," ( i tried switching to \t delemited files but im still running into a problem escaping when its between two "")

3) some rows are completely blank except for the delmiter ("," or "\t") others are incomplete (which is fine i just need to make sure that the cell goes in the correct place)

I've tried a few of the CSV reader classes but they would bump up agenst of teh problems listed above

I'm trying to keep this app as small as possible so i am also trying to avoid dlls and large classes that only a small portion do what i want.

so far i have two "attempts that are not working

Atempt 1 (doesn't handel \r\n in a cell)

OpenFileDialog openFileDialog1 = new OpenFileDialog();

            openFileDialog1.InitialDirectory = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
            openFileDialog1.Filter = "tab sep file (*.txt)|*.txt|All files (*.*)|*.*";
            openFileDialog1.FilterIndex = 1;
            openFileDialog1.RestoreDirectory = true;

            if (openFileDialog1.ShowDialog() == DialogResult.OK)
            {
                if (cb_sortable.Checked)
                {
                    header = "{| class=\"wikitable sortable\" border=\"1\" \r\n|+ Sortable table";
                }

                StringBuilder sb = new StringBuilder();
                string line;
                bool firstline = true;
                StreamReader sr = new StreamReader(openFileDialog1.FileName);

                sb.AppendLine(header);

                while ((line = sr.ReadLine()) != null)
                {

                    if (line.Replace("\t", "").Length > 1)
                    {
                        string[] hold;
                        string lead = "| ";

                        if (firstline && cb_header.Checked == true)
                        {
                            lead = "| align=\"center\" style=\"background:#f0f0f0;\"| ";
                        }

                        hold = line.Split('\t');
                        sb.AppendLine(table);
                        foreach (string row in hold)
                        {
                            sb.AppendLine(lead + row.Replace("\"", ""));
                        }


                        firstline = false;
                    }
                }
                sb.AppendLine(footer);
                Clipboard.SetText(sb.ToString());
                MessageBox.Show("Done!");
        }


        }
        string header = "{| class=\"wikitable\" border=\"1\" ";
        string footer = "|}";
        string table = "|-";

attempt 2 ( can handle \r\n but shifts cells over blank cells) (its not complete yet)

OpenFileDialog openFileDialog1 = new OpenFileDialog();

        openFileDialog1.InitialDirectory = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
        openFileDialog1.Filter = "txt file (*.txt)|*.txt|All files (*.*)|*.*";
        openFileDialog1.FilterIndex = 1;
        openFileDialog1.RestoreDirectory = true;

        if (openFileDialog1.ShowDialog() == DialogResult.OK)
        {
            if (cb_sortable.Checked)
            {
                header = "{| class=\"wikitable sortable\" border=\"1\" \r\n|+ Sortable table";
            }


            using (StreamReader sr = new StreamReader(openFileDialog1.FileName))
            {


                string text = sr.ReadToEnd();
                string[] cells = text.Split('\t');
                int columnCount = 0;
                foreach (string cell in cells)
                {

                    if (cell.Contains("\r\n"))
                    {
                        break;
                    }
                    columnCount++;
                }          


            }

basically all I needs is a "split if not between \" " but im just at a loss right now

any tips or tricks would be greatly appreciated

+2  A: 

Checkout this project instead of rolling your own CSV parser.

Darin Dimitrov
Michael Kropat
+1  A: 

You might take a look at http://www.filehelpers.com/ as well...

Don't try to do it by yourself if you can use libraries!

Yves M.
A: 

Try taking a look here. Your code doesn't make web requests, but effectively this shows you how to parse a csv that is returned from a web service.

cfarm54
It splits on "," but doesnt allow for a "," in a cell
Crash893
A: 

There's a decent implementation here...

It makes much more sense in this case to use tried-and-tested code rather than trying to roll your own.

LukeH
A: 

For a specification that's essentially two pages long, the CSV format is deceptive in its simplicity. The majority of short parser implementations that can be found on the internet are blatantly incorrect in one way or another. That notwithstanding, the format hardly seems to call for 1k+ SLOC implementations.

public static class CsvImport {
    /// <summary>
    /// Parse a Comma Separated Value (CSV) source into rows of strings. [1]
    /// 
    /// The header row (if present) is not treated specially. No checking is
    /// performed to ensure uniform column lengths among rows. If no input
    /// is available, a single row containing String.Empty is returned. No
    /// support is provided for debugging invalid CSV files. Callers who
    /// desire such assistance are encouraged to use a TextReader that can
    /// report the current line and column position.
    /// 
    /// [1] http://tools.ietf.org/html/rfc4180
    /// </summary>
    public static IEnumerable<string[]> Deserialize(TextReader input) {
        if (input.Peek() == Sentinel) yield return new [] { String.Empty };
        while (input.Peek() != Sentinel) {
            // must read in entire row *now* to see if we're at end of input
            yield return DeserializeRow(input).ToArray(); 
        }
    }

    const int Sentinel = -1;
    const char Quote = '"';
    const char Separator = ',';

    static IEnumerable<string> DeserializeRow(TextReader input) {
        var field = new StringBuilder();
        while (true) {
            var c = input.Read();
            if (c == Separator) {
                yield return field.ToString();
                field = new StringBuilder();
            } else if (c == '\r') {
                if (input.Peek() == '\n') {
                    input.Read();
                }
                yield return field.ToString();
                yield break;
            } else if (new [] { '\n', Sentinel }.Contains(c)) {
                yield return field.ToString();
                yield break;
            } else if (c == Quote) {
                field.Append(DeserializeQuoted(input));
            } else {
                field.Append((char) c);
            }
        }
    }

    static string DeserializeQuoted(TextReader input) {
        var quoted = new StringBuilder();
        while (input.Peek() != Sentinel) {
            var c = input.Read();
            if (c == Quote) {
                if (input.Peek() == Quote) {
                    quoted.Append(Quote);
                    input.Read();
                } else {
                    return quoted.ToString();
                }
            } else {
                quoted.Append((char) c);
            }
        }
        throw new UnexpectedEof("End-of-file inside quoted section.");
    }

    public class UnexpectedEof : Exception {
        public UnexpectedEof(string message) : base(message) { }
    }
}
Michael Kropat