To be able to browse a tree document (like HTML) without fully loading, you'll have to make a few assumptions - like the document being an actual tree. So, don't bother checking close tags. Close tags are designed for human consumption anyway, computers would be happy with <>
too.
The first step is to assume that the first part of your document is represented by the first part of your document. That sounds like a tautology, but with "modern" HTML and certainly JS this is technically no longer true. Still, if any line of HTML can affect any pixel, you simply cannot partially load a page.
So, if there's a simple relation between position the the HTML file and pages on screen, the next step is to define the parse state at the end of each page. This will then include a single file offset, probably (but not necessarily) at the end of a paragraph. Also part of this state is a stack of open tags.
To make paging easier, it's smart to keep this "page boundary" state for each page you've encountered so far. This makes paging back easy.
Now, when rendering a new page, the previous page boundary state will give you the initial rendering state. You simply read HTML and render it element by element until you overflow a single page. You then backtrack a bit and determine the new page boundary state.
Smooth scrolling is basically a matter of rendering two adjacent pages and showing x% of the first and 100-x% of the second. Once you've implemented this bit, it may become smart to finish a paragraph when rendering each page. This will give you slightly different page lengths, but you don't have to deal with broken paragraphs, and that in turn makes your page boundary state a bit smaller.