views:

56

answers:

4

My Problem is for particular case occuring in my project.

in my Html document, i want to replace <td> with <td class=”right”> for all tds except first one in a <tr> tag. (if there is <tr> inside a <tr> tag then that also needs to be handled).

if input is like

<tr>
  <td>1</td>
  <td>2</td>
  <td>3</td>
<tr>

output should be like

<tr>
  <td>1</td>
  <td class=”right”>2</td>
  <td class=”right”>3</td>
<tr>

i have tried..this code..

public static string tableFormat(string html)   // Add extra attribute to td
        {
            int start = 0, end = 0, trstart = 0, trend = 0;
           // html = CleanUpXHTML(html);  // clean unnecessary p tags
            while (html.Contains("<tr>"))
            {
                //start=end;
                trstart = html.IndexOf("<tr>", end);
                if (trstart == -1)
                    break;
                trend = html.IndexOf("</tr>", trstart);
                start = html.IndexOf("<td>", trstart);
                end = html.IndexOf("</td>", trend);
                while (end < trend)
                {

                    start = html.IndexOf("<td>", end);
                    html = html.Insert(start + 3, " class=\"right\"");
                    end = html.IndexOf("</td>", trstart);

                }
            }
            return html;
        }
A: 

just call this function from main: Note:this code will work for valid html i.e xhtml

 public static string TableFormat(string xhtml)
    {
        int start = 0, end = 0, trstart = 0, trend = 0;

        while (trstart != -1)
        {
            //start=end;
            trstart = xhtml.IndexOf("<tr>", end);
            if (trstart == -1)
                break;
            trend = xhtml.IndexOf("</tr>", trstart);
            start = xhtml.IndexOf("<td>", trstart);
            end = xhtml.IndexOf("</td>", start);
            while (end < trend)
            {
                //int trackTr = 0;
                start = xhtml.IndexOf("<td>", end);
                if (start > trend)
                    break;
                xhtml = xhtml.Insert(start + 3, " class=\"right\"");

                end = xhtml.IndexOf("</td>", start);

            }
        }
        return (xhtml);
    }
Smack
A: 

Have you stepped through this code and verified that it works as intended? HTML is very forgiving about things like tag case and whitespace, but your method is not; if the HTML isn't formatted very specifically, your method will likely fail. I'd take a look at that.

Also, you might want to build some more flexibility into it. It might work now (once you get the issue resolved), but if the source HTML ever changes, it may not in the future.

Mike Hofer
seems like he wants it to be hardcore !! may be for some particular purpose as he said.
Sangram
Okay. But what if, at some point down the road, the TD tag already contains a class attribute? Or what if the tag is written as "<TD> or "<td >" or "<Td >" or some other variant? He can control it now, but once it goes live and others get their hands on the code, all bets are off.
Mike Hofer
@Mike: it will work only for valid xhtml as He specified earlier.
Sangram
Not being argumentative, but the OP didn't specify "valid xhtml." And with that I'll let it drop.
Mike Hofer
A: 

if there is inside a tag then that also needs to be handled

Handling nested structures like that is not possible with regex.

Regex is an extraordinarily poor tool for manipulating HTML. Do yourself a favour and grab yourself a proper parser instead and your code will be simpler and more reliable. eg. with HTML Agility Pack:

HtmlDocument doc= new HtmlDocument();
doc.LoadHtml(html);
foreach (HtmlNode td in doc.DocumentElement.SelectNodes("//tr/td[position()>1]"]) {
    td.SetAttributeValue("class", "right");
}
bobince
A: 

Consider using a regular expression...

        string pattern = @"(?<!(<tr>\s*))<td>";
        string test = @"<tr> 
                          <td>1</td> 
                          <td>2</td> 
                          <td>3</td> 
                        </tr> ";
        string result = Regex.Replace(test, pattern, "<td class=\"right\">", RegexOptions.IgnoreCase | RegexOptions.Multiline);
        Console.WriteLine("{0}", result);

This works with upper or lower case and any amount of whitespace betweent the <tr> and the <td>. Anything other than whitespace would cause this to fail.

Les
what about <tr> inside another <tr> tag ? iguess not possible !! ?
TERNA_staff
it's possible, but would not be valid html. the example finds the first <td> in the <tr> ignoring only whitespace
Les