I'm creating a blog (and the rest of a website) using Python and Flask. Blog posts are written in Markdown and converted to HTML using the creatively named Markdown in Python. Both the Markdown (for future editing) and the HTML (for display) are stored in the database.
I want to be able to automatically get the first 300 characters of text (or 500, or 200 — I haven't worked out the number) to use on pages when I don't want to display the full blog post (like on the front page). However, the problem is that any simple way of doing it will potentially leave me with invalid HTML or Markdown:
HTML:
<p><em>Here</em> is <strong>formatted</strong> text.</p>
If I get the first ten characters of this, it will leave me halfway through formatted, and I would somehow need to close the <strong>
and <p>
tag.
Markdown:
*Here* is **formatted** text.
Likewise, getting the first ten characters will leave me needing to close the **
for bold.
Is there any way I can do this without needing to write a HTML or Markdown parser? Or, would I be better off just converting the HTML into plain text?