Question: Is there a clever way to parse plain-text lists into HTML?
Or, must we resort to esoteric recursive methods, or sheer brute force?
I've been wondering this for a while now. In my own ruminations I have come back again and again to the brute-force, and odd recursive, methods ... but it always seems so clunky. There must be a better way, right?
So what's the clever way?
Assumptions
It is necessary to set up a scenario, so these are my assumptions.
Lists may be nested 3 levels deep (at a minimum), of either unordered or ordered lists. The list type and depth is controlled by its prefix:
- There is a mandatory space following the prefix.
- List depth is controlled by how many non-spaced characters there are in the prefix;
*****
would be nested five lists deep. - List type is enforced by character type,
*
or-
being an unordered list,#
being a disordered list.
Items are separated by only 1
\n
character. (Lets pretend two consecutive new-lines qualify as a "group", a paragraph, div, or some other HTML tag like in Markdown or Textile.)List types may be freely mixed.
Output should be valid HTML 4, preferably with ending
</li>
sParsing can be done with, or without, Regex as desired.
Sample Markup
* List
*# List
** List
**# List
** List
# List
#* List
## List
##* List
## List
Desired Output
Broken up a bit for readability, but it should be a valid variation of this (remember, that I'm just spacing it nicely!):
<ul>
<li>List</li>
<li>
<ol><li>list</li></ol>
<ul><li>List</li></ul>
</li>
<li>List</li>
<li>
<ol><li>List</li></ol>
</li>
<li>List</li>
</ul>
<ol>
<li>List</li>
<li>
<ul><li>list</li></ul>
<ol><li>List</li></ol>
</li>
<li>List</li>
<li>
<ul><li>List</li></ul>
</li>
<li>List</li>
</ol>
In Summary
Just how do you do this? I'd really like to understand the good ways to handle unpredictably recursing lists, because it strikes me as an ugly mess for anyone to tangle with.