tags:

views:

1044

answers:

3

I'm adding a function to my own personal toolkit lib to do simple CSV to HTML table conversion.

I would like the smallest possible piece of code to do this in C#, and it needs to be able to handle CSV files in excess of ~500mb.

So far my two contenders are

  • splitting csv into arrays by delimiters and building HTML output

  • search-replace delimiters with table th tr td tags

Assume that the file/read/disk operations are already handled... i.e., i'm passing a string containing the contents of said CSV into this function. The output will consist of straight up simple HTML style-free markup, and yes the data may have stray commas and breaks therein.

update: some folks asked. 100% of the CSV i deal with comes straight out of excel if that helps.

Example string:

a1,b1,c1\r\n
a2,b2,c2\r\n
+3  A: 

Read All Lines into Memory

    var lines =File.ReadAllLines(args[0]);
    using (var outfs = File.AppendText(args[1]))
    {
        outfs.Write("<html><body><table>");
        foreach (var line in lines)
            outfs.Write("<tr><td>" + string.Join("</td><td>", line.Split(',')) + "</td></tr>");
        outfs.Write("</table></body></html>");
    }

or Read one line at a time

    using (var inFs = File.OpenText(args[0]))
    using (var outfs = File.AppendText(args[1]))
    {
        outfs.Write("<html><body><table>");
        while (!inFs.EndOfStream )
            outfs.Write("<tr><td>" + string.Join("</td><td>", inFs.ReadLine().Split(',')) + "</td></tr>");
        outfs.Write("</table></body></html>");
    }

... @Jimmy ... I created an extended version using LINQ and posted it on my blog. Here is the highlight ... (lazy eval for line reading)

    using (var lp = args[0].Load())
        lp.Select(l => "<tr><td>" + string.Join("</td><td>", l.Split(',')) + "</td></tr>")
        .Write("<html><body><table>", "</table></body></html>", args[1]);
Matthew Whited
Your second solution is by far the best so far because of the mentioned memory requirements. If the solution really does need to handle files on the order of 500MB, then storing the entire contents in memory is not really a good idea.
Bryce Kahle
Yeah, I wrote the first version for simplicity and then saw the new requirement so I figured I'd extend it. Full features, clean, and still pretty short. I'm thinking I could stuff in soem LINQ but then it would still try to load this all in memory.
Matthew Whited
if you wrap the Readlines() in a function that yield returns each line you could get the IEnumerable for LINQ.
Jimmy
But that would make the code longer
Matthew Whited
+1  A: 

probably not much shorter you can get than this, but just remember that any real solution would handle quotes, commas inside of quotes, and conversions to html entities.

return "<table><tr><td>"+s
   .Replace("\n","</td></tr><tr><td>")
   .Replace(",","</td><td>")+"</td></tr></table>";

EDIT: here's (largely untested) addition of htmlencode and quote-matching. I htmlencode first, then all commas become '<' (which don't collide because the existing ones have been encoded already.

bool q=false;
return "<table><tr><td>"
  + new string(HttpUtility.HtmlEncode(s)
       .Select(c=>c=='"'?(q=!q)?c:c:(c==','&&!q)?'<':c).ToArray())
    .Replace("<", "</td><td>")
    .Replace("\n", "</td></tr><tr><td>")
  + "</td></tr></table>";
Jimmy
Chaining the .Replace(), nice - I was thinking of something like this originally but didn't know how to deal with the beginning and end rows - it's all so obvious now lol
John Rasch
well, the shorter the code-golf answer, the less useful it is, so you shouldn't have deleted your answer ;)
Jimmy
+1  A: 

Here's a fun version using lambda expressions. It's not as short as replacing commas with "</td><td>", but it has it's own special charm:

var r = new StringBuilder("<table>");
s.Split('\n').ToList().ForEach(t => r.Append("<tr>").Append(t.Split(',').Select(u => "<td>" + u + "</td>")).Append("</tr>"));
return r.Append("</table>").ToString();

If I were to right this for production, I'd use a state machine to track nested quotes, newlines, and commas, because excel can put new lines in the middle of column. IIRC you can also specify a different delimiter entirely.

Joel Coehoorn