views:

418

answers:

6

Hello, A have a string like this:

string s = @"
    <tr>
    <td>11</td><td>12</td>
    </tr>
    <tr>
    <td>21</td><td>22</td>
    </tr>
    <tr>
    <td>31</td><td>32</td>
    </tr>";

How to create Dictionary<int, int> d = new Dictionary<int, int>(); from string s to get same result as :

d.Add(11, 12);
d.Add(21, 22);
d.Add(31, 32);
+10  A: 

You should use the HTML Agility Pack.

For example: (Tested)

var doc = new HtmlDocument();
doc.LoadHtml(s);
var dict = doc.DocumentNode.Descendants("tr")
              .ToDictionary(
                  tr => int.Parse(tr.Descendants("td").First().InnerText),
                  tr => int.Parse(tr.Descendants("td").Last().InnerText)
              );

If the HTML will always be well-formed, you can use LINQ-to-XML; the code would be almost identical.

SLaks
very helpful tips and answers. and I learn about HTML Agility Pack. Best solution for tasks like this. Thanks
loviji
A: 

If you don't want to use the HTML agility pack you could try something similar to:

var arr = s.Replace("<tr>", "").Split("</tr", StringSplitOptions.RemoveEmptyEntries);

var d = new Dictionary<int, int>();
foreach (var row in arr) {
  var itm = row.Replace("<td>", "").Split("</td>", StringSplitOptions.RemoveEmptyEntries);
  d.Add(int.Parse(itm[0]), int.Parse(itm[1]);
}

(untested)

Sani Huttunen
As Andrew M mentioned, that way lies madness. It's equivalent to using regular expressions. http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html
TrueWill
+3  A: 

Code

using RE=System.Text.RegularExpressions;

....

public void Run()
{
    string s=@"
<tr>
<td>11</td><td>12</td>
</tr>
<tr>
<td>21</td><td>22</td>
</tr>
<tr>
<td>31</td><td>32</td>
</tr>";

    var mcol= RE.Regex.Matches(s,"<td>(\\d+)</td><td>(\\d+)</td>");
    var d = new Dictionary<int, int>();

    foreach(RE.Match match in mcol)
        d.Add(Int32.Parse(match.Groups[1].Value),
              Int32.Parse(match.Groups[2].Value));

    foreach (var key in d.Keys)
        System.Console.WriteLine("  {0}={1}", key, d[key]);
}
Cheeso
You should not do this. If you do, you should probably ignore whitespace in the tags. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
SLaks
Maybe, but it worked for his HTML. I think you're saying that it won't work for other HTML. Fair point.
Cheeso
A: 
var s = "<tr><td>11</td><td>12</td></tr><tr><td>21</td><td>22</td></tr><tr><td>31</td><td>32</td></tr>";

var rows = s.Split( new[] { "</tr>" }, StringSplitOptions.None );

var results = new Dictionary<int, int>();
foreach ( var row in rows )
{
    var cols = row.Split( new[] { "</td>" }, StringSplitOptions.None );
    var vals = new List<int>();

    foreach ( var col in cols )
    {
        var val = col.Replace( "<td>", string.Empty ).Replace( "<tr>", string.Empty );

        int intVal;
        if ( int.TryParse( val, out intVal ) )
            vals.Add( intVal );
    }

    if ( vals.Count == 2 )
        results.Add( vals[0], vals[1] );
}
Thomas
+1  A: 
string s =
@"<tr> 
<td>11</td><td>12</td> 
</tr> 
<tr> 
<td>21</td><td>22</td> 
</tr> 
<tr> 
<td>31</td><td>32</td> 
</tr>";

XPathDocument doc = new XPathDocument(XmlReader.Create(new StringReader(s), new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment, IgnoreWhitespace = true }));

Dictionary<int, int> dict = doc.CreateNavigator()
   .Select("tr")
   .Cast<XPathNavigator>()
   .ToDictionary(
      r => r.SelectSingleNode("td[1]").ValueAsInt,
      r => r.SelectSingleNode("td[2]").ValueAsInt
   );
Max Toro
A: 

using RE=System.Text.RegularExpressions;

....

public void Run() { string s=@" 1112 2122 3132 ";

var mcol= RE.Regex.Matches(s,"<td>(\\d+)</td><td>(\\d+)</td>"); 
var d = new Dictionary<int, int>(); 

foreach(RE.Match match in mcol) 
    d.Add(Int32.Parse(match.Groups[1].Value), 
          Int32.Parse(match.Groups[2].Value)); 

foreach (var key in d.Keys) 
    System.Console.WriteLine("  {0}={1}", key, d[key]); 

}