You could loop over the html string to detect the angle brackets and build up an array of tags and whether there was a matching closing tag for each one. The problem is, HTML allows for non closing tags, such as img, br, meta - so you'd need to know about those. You would also need to have rules to check the order of closing, because just matching an open with a close doesn't make valid HTML - if you open a div, then a p and then close the div and then close the p, that isn't valid.
Your requirement is very unclear so most of this is guesswork. Also, you have provided no code which would help to clarify what it is you want to do.
One solution could be:
a. Find the text between the <p>
and the </p>
tags. You can use the following Regex for this or use a simple string search:
\<p\>(.*?)\</p\>
b. In the found text, apply a Substring()
to extract the required text.
c. Put back the extracted text between the <p>
and the </p>
tags.
You need to teach your code how to understand that your string is actually HTML or XML. Just treating it like a string won't allow you to work with it the way you want to. This means first transforming it to the correct format and then working with that format.
Use an XSL stylesheet
If your HTML is well-formed XML, load it into an XMLDocument
and run it through an XSL stylesheet that does something like the following:
<xsl:template match="p">
<xsl:value-of select="substring(text(), 0, 10)" />
</xsl:template>
Use an HTML parser
If it's not well-formed XML (as in your example, where you have a sudden </p>
in the middle), you'll need to use a HTML parser of some kind, such as HTML Agility Pack (see this question about C# HTML parsers).
Don't use regular expressions, since HTML is too complex to parse using regex.