views:

269

answers:

3

Hi, I'm working on a "personal-can-it-work" sort of thing, and i have everything working great except for trying to parse some information from a .asp sourcefile into my Program.

This is the parsing code i have so far

 // parse out the results
            try 
            {
                int snr_start = result.IndexOf("SNR");
                int snr_end = result.IndexOf("</TR>", snr_start);
                snr = result.Substring(snr_start, snr_end - snr_start);
                snr = snr.Substring(snr.IndexOf("<TD>") + 1);
                snr = snr.Substring(0, snr.Length - 6);
                iSNR = Convert.ToInt32(snr.Substring(0, snr.IndexOf(" ")));

                int dnpwr_start = result.IndexOf("Downstream Power", snr_end);
                int dnpwr_stop = result.IndexOf("</TR>", dnpwr_start);
                dnpwr = result.Substring(dnpwr_start, dnpwr_stop - dnpwr_start);
                dnpwr = dnpwr.Substring(dnpwr.IndexOf("<TD>") + 1);
                dnpwr = dnpwr.Substring(0, dnpwr.IndexOf("<TABLE") - 1);
                iDPWR = Convert.ToInt32(dnpwr.Substring(0, dnpwr.IndexOf(" ")));

                int uppwr_start = result.IndexOf("Upstream Power", dnpwr_stop);
                int uppwr_stop = result.IndexOf("</TR>", uppwr_start);
                uppwr = result.Substring(uppwr_start, uppwr_stop - uppwr_start);
                uppwr = uppwr.Substring(uppwr.IndexOf("<TD>") + 1);
                uppwr = uppwr.Substring(0, uppwr.IndexOf("</TD>") - 1);
                iUPWR = Convert.ToInt32(uppwr.Substring(0, uppwr.IndexOf(" ")));
            }
            catch 

And this is the Sourcefile and the Information i'm trying to scrape from it (SNR, Downstream Power, Upstream Power)

<td class="headerR">Downstream Power</td>
<td class="contentL">1.0 dBmV</td>
</tr>
<tr>
<td class="headerR">SNR</td>
<td class="contentL">39.656 dB</td>
</tr>
<tr>
<td class="headerR">Upstream Power</td>
<td class="contentL">42.0 dBmV</td>
</tr>

Not too sure where i'm going wrong to, but any helpwould be greatly appreaciated. The focus of the project is so i can parse the signal levels off of my modem (I'm a MSO employee) for extended monitoring. If needed i can post the full source from the .asp page

Thanks, Matt

A: 

I am not too keen on using those string methods for screen scraping unless it's your last resort.

You can try using some Regex...or even better, if you can guarantee that your HTML source is well formed (XHTML), you could load it (or the snippet of XML you want) into an XML Document object and use either XPath, or Linq-to-XML (XLinq), if using .NET 3.5.

Thiago Silva
A: 

ASP source? Your best bet is probably a regular expression - they're designed for this kind of task. Any kind of scraping usually means it will be worth your while to dig into them.

What langauge are you using to parse it? If .Net, you can get your name/value pairs easy with the Regex class.

Something like this for the regex:

"<tr>\s*<td\s+class\s*=\s*\"headerR\"\s*>\s*(?<name>[^<])\s*</td\s*>\s*<td\s+class\s*=\s*\"contentL\"\s*>\s*(?<value>[^<])\s*</td\s*>\s*</tr\s*>"

Then, you can loop throught the captures and get your list of name value pairs:

"Downstream Power":"1.0 dBmV" "SNR":"39.656 dB" "Upstream Power":"42.0 dbmV"

Should be straightforward.

Computer Linguist
A: 

This should work if you only want to pull the data from one table:

int start = result.IndexOf("<table>");
int end = result.IndexOf("</table>", start) + 8;
var doc = new XmlDocument();
doc.LoadXml(result.Substring(start, end - start);

iSNR = Convert.ToInt32(doc.SelectSingleNode("following-sibling::tr/td[text() = 'SNR']").InnerText.Split(' ')[0]);
iDPWR = Convert.ToInt32(doc.SelectSingleNode("following-sibling::tr/td[text() = 'Downstream Power']").InnerText.Split(' ')[0]);
iUPWR = Convert.ToInt32(doc.SelectSingleNode("following-sibling::tr/td[text() = 'Upstream Power']").InnerText.Split(' ')[0]);
ChaosPandion