tags:

views:

55

answers:

5

Hi,

I am trying to get a subsection of an HTML page. The functionality I am looking for is similar to the one implemented on most blogs. Usually, on the main page of the blog, you only see a section of the post, and when you click on the title you get the full content of that blog post. There must be code that exists to get that subsection without breaking the HTML.

Does anyone know of good .NET code that does that?

EDIT: I need to keep the HTML formatting of the content, so stripping all the HTML isn't really an option. I wouldn't mind taking a fixed-length substring of the content (i.e. the first 800 characters or so) but then not breaking the HTML would be a nightmare.

Thanks!

+3  A: 

I would strip the html first from the content string (http://stackoverflow.com/questions/785715/asp-net-strip-html-tags) then do a left on the resulting string.

Daniel A. White
A: 

Usually the way that's done isn't by chunking off a piece of the HTML. Rather, There's a database that contains the blog posts, and the Main page has it's own HTML/CSS which dynamically loads only the first X paragraphs of each blog post.

BFree
+1  A: 

Usually this works by taking a substring of the contents of that blog post before the blog post is rendered into html.

Kragen
But then you would lose all the HTML from your post (links, tables, etc.)?
Hugo Migneron
You need to strip out the html tags from your post - Daniels answer links to a good way of doing this.
Kragen
+1  A: 

That wouldn't be done by cutting the page output directly (messing with the HTML).

Handle that with server-side code displaying a trim of the blog content.

jfrobishow
A: 

To my mind the "simplest thing that could possibly work" would be to scan the blog post that you want to summarize until you get to the first close-paragraph </p> tag.

Don't be tempted to scan the HTML with a regex.

Jeremy McGee