tags:

views:

46

answers:

3

Hello! I'm looking to parse some information on my application. Let's say we have somewhere in that string:

<tr class="tablelist_bg1">

<td>Beja</td>

<td class="text_center">---</td>

<td class="text_center">19.1</td>

<td class="text_center">10.8</td>

<td class="text_center">NW</td>

<td class="text_center">50.9</td>

<td class="text_center">0</td>

<td class="text_center">1016.6</td>

<td class="text_center">---</td>

<td class="text_center">---</td>

</tr>

All rest that's above or below this doesn't matter. Remember this is all inside a string. I want to get the values inside the td tags: ---, 19.1, 10.8, etc. Worth knowing that there are many entries like this on the page. Probably also a good idea to link the page here.

As you probably guessed I have absolutely no idea how to do this... none of the functions I know I can perform over the string (split etc.) help.

Thanks in advance

A: 

Assuming your string is valid XHTML, you can use use an XML parser to get the content you want. There's a simple example here that shows how to use XmlTextReader to parse XML content. The example reads from a file, but you can change it to read from a string:

new XmlTextReader(new StringReader(someString));

You specifically want to keep track of td element nodes, and the text node that follows them will contain the values you want.

casablanca
+1  A: 

Just use String.IndexOf(string, int) to find a "<td", again to find the next ">", and again to find "</td>". Then use String.Substring to pull out a value. Put this in a loop.

    public static List<string> ParseTds(string input)
    {
        List<string> results = new List<string>();

        int index = 0;

        while (true)
        {
            string next = ParseTd(input, ref index);

            if (next == null)
                return results;

            results.Add(next);
        }
    }

    private static string ParseTd(string input, ref int index)
    {
        int tdIndex = input.IndexOf("<td", index);
        if (tdIndex == -1)
            return null;
        int gtIndex = input.IndexOf(">", tdIndex);
        if (gtIndex == -1)
            return null;
        int endIndex = input.IndexOf("</td>", gtIndex);
        if (endIndex == -1)
            return null;

        index = endIndex;

        return input.Substring(gtIndex + 1, endIndex - gtIndex - 1);
    }
arx
A very nice answer and easy to understand.
Queops
.. Thank you! ..
arx
A: 
Use a loop to load each non empty line from the file into a String
Process the string character by charcter
 Check for characters indicating the the begining of a td tag
  use a substring function or just bulild a new string character by character to get all the content until the </td> tag begins.
sca