views:

62

answers:

1

I am parsing html tabular information with the help of the html agility pack. Now First I am finding the rows in that table like

var rows = table.Descendants("tr");

then I find the cell data for each row like

foreach(var row in rows)
{
     string rowInnerText = row.InnerText;
}

That gives me the cell data.But with no spaces between them like NameAdressPhone No but I want the innertext like Name Address Phone No means where there is td tag I want to keep there one space between different column cell.

+1  A: 

Here is an idea, however completely untested:

var rows = table.Descendants("tr").Select(tr =>
    string.Join(" ", tr.Descendants("td").Select(td => td.InnerText).ToArray()));

This should give you en IEnumerable<string> where each contained element represents one row from the table, in the format described in your question. If you actually need your loop over the rows for other processing, keep your foreach loop and use the LINQ magic in its body:

var rows = table.Descendants("tr");

foreach (var row in rows)
{
     string rowInnerText = string.Join(" ",
         row.Descendants("td").Select(td => td.InnerText).ToArray());
}
Jørn Schou-Rode
@Jorn Schou-Rode,Thank you very much for the answer.In the second code is there something missing because I get the error tr does not exists in this context.
Harikrishna
@Harikrishna: My bad, mixed up `td` and `row`. Should be fixed now.
Jørn Schou-Rode
@Jorn Schou-Rode, No I have checked it but there is no space between the column header and got the same result what I got first.
Harikrishna
@Jorn Schou-Rode,When I get the first row(column header) from the loop , there is no spaces between column header names like namephonenoaddress.
Harikrishna
@Har: Sounds weird! How many items are returned by the `row.Descendants("td")` expression?
Jørn Schou-Rode
@Jorn Schou-Rode,rowInnerText has a row with all td innertext.
Harikrishna
@Har: Please see the question in my last comment. Also, please check that you have a space between the quotes in `string.Join(" ",`
Jørn Schou-Rode
@Jom Schou-Rode,Thank You it works perfectly,Sorry there was mistaken by me.Thanks again for your help.Thanks...
Harikrishna
@Jom Schou-Rode,It works but if first column name comes then can we remove spaces between words of the column.Like there are two columns Trade Number, Trade Time then I want like TradeNumber TradeTime.How can we do that ?
Harikrishna
@Har: Change the expression in the lambda to `td.InnerText.Replace(" ", "")`
Jørn Schou-Rode
@Jorn Schour-Rode,There is no space even between column names now with that.But I want to do like there are 3 columns **Trade Number,Trade Time,Amount Rs.** then I want **TradeNumber TradeTime AmountRs.** means spaces between column names but no space between words of every column name.
Harikrishna
@Har: I think you might have misplaced that `Replace(" ", "")` - if it is placed right after `InnerText`, it should give the result you are looking for.
Jørn Schou-Rode