I'm trying to generate a table of contents from a block of HTML (not a complete file - just content) based on its <h2>
and <h3>
tags.
My plan so far was to:
Extract a list of headers using
beautifulsoup
Use a regex on the content to place anchor links before/inside the header tags (so the user can click on the table of contents) -- There might be a method for replacing inside
beautifulsoup
?Output a nested list of links to the headers in a predefined spot.
It sounds easy when I say it like that, but it's proving to be a bit of a pain in the rear.
Is there something out there that does all this for me in one go so I don't waste the next couple of hours reinventing the wheel?
A example:
<p>This is an introduction</p>
<h2>This is a sub-header</h2>
<p>...</p>
<h3>This is a sub-sub-header</h3>
<p>...</p>
<h2>This is a sub-header</h2>
<p>...</p>