tags:

views:

36

answers:

2

I'm displaying some content using the following bit of code:

<% foreach ( var m in ViewData.Model) { %>

<div class="content">
<%= m.article %>
</div>

Which shows a news article from my database. What I want to do is truncate the story to be smaller, how do would I do this? Also the article column contains HTML tags such as <p> so therefore splitting up the article purely on character numbers would cause issues.

Any ideas on where to start? I'm new to MVC.

Thanks

A: 

This is not a simple problem, but you can start by using the HtmlAgilityPack.

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

Once you have your DOM in memory you can then extract or truncate elements in a way that will not break the DOM structure, e.g., you can remove entire <p></p> nodes, or just truncate some fo the text within that node.

RedFilter
Hi, I don't want to download any additional stuff. I want to use purely the default framework.
Cameron
The default framework does not contain anything to handle the messiness that is HTML. If you know that your content is valid XHTML (or you can convert it to be), then just use the XML classes in C# to do something similar.
RedFilter
Yes it is valid XHTML. Would you be able to explain how to do that. As like I said I'm new to C# and MVC and so not really sure what you mean.
Cameron
Or copy paste the code from the library you don't want to download.
Paco
@Paco: HtmlAgilityPack is provided as source.
RedFilter
A: 

Do you have control over how articles are submitted? Perhaps you can ask for a short intro/summary. I believe WordPress does it.

Or have the user insert a marker which signifies the end of intro/summary. something like:

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
<!-- END INTRO -->
<p>Ut malesuada porttitor consectetur. </p>

You can display it without any consequences and still "split" the article programmatically.

nsr81