views:

20

answers:

1

I have a template where I want to replace certain regions. In my example below, I want to extract the regions between the ... comments, manipulate it, then replace them back after the manipulation.

I do not need the logic to merge the fields, but I need to extract the regions so I can use my logic and place it back into the template.

Does anyone know of an elegant or simple way to extract these regions? I am also hoping to extract the url values in the process as well if it is easy to do along the way.

<table width="700" border="0" align="center" cellpadding="4" cellspacing="0">
 <tr>
  <td align="center" valign="top">
   <!--DynamicSlotStart url="http://www.test.com/itemdisplay0_10751_-1_57436_10001"--&gt;
   <table>
    <tbody>
     <tr>
      <td><p><a title="[element='title']" href="[url]"><img border="0" alt="[element='title']" src="[element='photo' property='src' maxwidth='135']" width="135" height="135" /></a></p></td>
     </tr>
     <tr>
      <td><span>[element='h1']</span></td>
     </tr>
     <tr>
      <td><span><strong>[element='price']<br />
      </strong></span><span>[element='was_price']</span></td>
     </tr>
     <tr>
      <td><span><a title="[element='title']" href="[url]">Details</a></span></td>
     </tr>
    </tbody>
   </table>
   <!--DynamicSlotFinish-->
  </td>
  <td align="center" valign="top">
   <!--DynamicSlotStart url="http://www.test.com/itemdisplay0_10751_-1_3379_10001"--&gt;
   <table>
    <tbody>
     <tr>
      <td><p><a title="[element='title']" href="[url]"><img border="0" alt="[element='title']" src="[element='photo' property='src' maxwidth='135']" width="135" height="135" /></a></p></td>
     </tr>
     <tr>
      <td><span>[element='h1']</span></td>
     </tr>
     <tr>
      <td><span><strong>[element='price']<br />
      </strong></span><span>[element='was_price']</span></td>
     </tr>
     <tr>
      <td><span><a title="[element='title']" href="[url]">Details</a></span></td>
     </tr>
    </tbody>
   </table>
   <!--DynamicSlotFinish-->
  </td>
  <td align="center" valign="top">
   <!--DynamicSlotStart url="http://www.test.com/itemdisplay0_10751_-1_104854_10001"--&gt;
   <table>
    <tbody>
     <tr>
      <td><p><a title="[element='title']" href="[url]"><img border="0" alt="[element='title']" src="[element='photo' property='src' maxwidth='135']" width="135" height="135" /></a></p></td>
     </tr>
     <tr>
      <td><span>[element='h1']</span></td>
     </tr>
     <tr>
      <td><span><strong>[element='price']<br />
      </strong></span><span>[element='was_price']</span></td>
     </tr>
     <tr>
      <td><span><a title="[element='title']" href="[url]">Details</a></span></td>
     </tr>
    </tbody>
   </table>
   <!--DynamicSlotFinish-->
  </td>
  <td align="center" valign="top">
   <!--DynamicSlotStart url="http://www.test.com/itemdisplay0_10751_-1_80977_10001"--&gt;
   <table>
    <tbody>
     <tr>
      <td><p><a title="[element='title']" href="[url]"><img border="0" alt="[element='title']" src="[element='photo' property='src' maxwidth='135']" width="135" height="135" /></a></p></td>
     </tr>
     <tr>
      <td><span>[element='h1']</span></td>
     </tr>
     <tr>
      <td><span><strong>[element='price']<br />
      </strong></span><span>[element='was_price']</span></td>
     </tr>
     <tr>
      <td><span><a title="[element='title']" href="[url]">Details</a></span></td>
     </tr>
    </tbody>
   </table>
   <!--DynamicSlotFinish-->
  </td>
 </tr>
</table>
A: 

Maybe this project will be helpful: Html Agility Pack

What is exactly the Html Agility Pack (HAP)?

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

Html Agility Pack now supports Linq to Objects (via a LINQ to Xml Like interface). Check out the new beta to play with this feature

Sample applications:

  • Page fixing or generation. You can fix a page the way you want, modify the DOM, add nodes, copy nodes, well... you name it.

  • Web scanners. You can easily get to img/src or a/hrefs with a bunch XPATH queries.

  • Web scrapers. You can easily scrap any existing web page into an RSS feed for example, with just an XSLT file serving as the binding. An example of this is provided.

Nick Martyshchenko