How to get HTML text between H1 tags in C# | ansaurus

tags:

views:

72

answers:

1

Q:

How to get HTML text between H1 tags in C#

I need to parse an HTML document to extract all the H1 tags and all HTML between them. I have been playing with HtmlAgilityPack to achieve this with some success. I could extract all H1 tags using:

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//h1"))

But how do I extract all the HTML after every H1 tag until I hit the next H1 tag? This HTML could include anything from a table/image/link or any other thing on an HTML page but H1 tag.

Thanks in advance.

A:

Possible solution: Get the complete HTML as String, replace < H1 > with a sign HTML does not know (e.g. ü, HTML uses & uuml;), then split the String by this sign into an array.

Now you search (with RegEx for example) for nodes that have start AND end tags and only parse those.

Quick and dirty, but should work.

Please be aware, that, as drachenstern mentioned, nested H1-Tags will lead to parent-nodes not being parsed.

Semyazas 2010-10-12 00:13:27

related questions

Displaying Flash content in a C# WinForms application

How to get the value of built, encoded ViewState?

Unhandled Exception Handler in .NET 1.1

How do I connect to a database and loop over a recordset in C#?

How do I most elegantly express left join with aggregate SQL as LINQ query

Get a new object instance from a Type in C#

.NET Testing Framework Advice

Automatically update version number

What is the difference between an int and an Integer in Java/C#?

How to write to Web.Config in Medium Trust ?

WinForms ComboBox data binding gotcha

How do you sort a C# dictionary by value?

Adding Scripting functionality to .NET applications

Floating Point Number parsing: Is there a Catch All algorithm?

How do I print an HTML document from a web service?

Decoding T-SQL CAST in C#/VB.net

Anatomy of a "Memory Leak"

How do I get a distinct, ordered list of names from a DataTable using Linq

Reliable Timer in a Console Application

How do I fill a DataSet or a DataTable from a LINQ query resultset ?

What's the difference between Math.Floor() and Math.Truncate() in .NET?

How do I calculate relative time?

How do I calculate someone's age in C#?

Are there any conversion tools for porting Visual J# code to C#?

When setting a form's opacity should I use a decimal or double?