views:

197

answers:

2

I'm creating a linux program in C++ for a portable device in order to render html files.

The problem is that the device is limited in RAM, thus making it impossible to open big files (with actual software).

One solution is to dynamically load/unload parts of the file, but I'm not sure how to implement that.

The ability of scrolling is a must, with a smooth experience if possible

I would like to hear from you what is the best approach for such situation ? You can suggest an algorithm, an open-source project to take a look at, or a library that support what I'm trying to do (webkit?).

EDIT: I'm writing an ebook reader, so I just need pure html rendering, no javascript, no CSS, ...

A: 

Dillo is the lightest weight Linux web browser that I'm aware of.

Edit: If it (or its rendering component) won't meet your needs, then you might find Wikipedia's list of and comparison of layout engines to be helpful.

Edit 2: I suspect that dynamically loading and unloading parts of an HTML file would be tricky; for example, how would you know that a randomly chosen chunk of the file isn't in the middle of a tag? You'd probably have to use something like SAX to parse the file into an intermediate representation, saving discrete chunks of the intermediate representation to persistent storage so that they won't take up too much RAM. Or you could parse the file with SAX to show whatever fits in RAM at once then re-parse it whenever the user scrolls too far. (Stylesheets and Javascript would ruin this approach; some plain HTML might too.) If it were me, I'd try to find a simple markup language or some kind of rich text viewer rather than going to all of that difficulty.

Josh Kelley
This looks like a good solution except:1- This is full browser, not a library2- Even if it's lightweight, there are no indications that it can handle huge files (dynamically loading/unloading pages)
karatchov
+1  A: 

To be able to browse a tree document (like HTML) without fully loading, you'll have to make a few assumptions - like the document being an actual tree. So, don't bother checking close tags. Close tags are designed for human consumption anyway, computers would be happy with <> too.

The first step is to assume that the first part of your document is represented by the first part of your document. That sounds like a tautology, but with "modern" HTML and certainly JS this is technically no longer true. Still, if any line of HTML can affect any pixel, you simply cannot partially load a page.

So, if there's a simple relation between position the the HTML file and pages on screen, the next step is to define the parse state at the end of each page. This will then include a single file offset, probably (but not necessarily) at the end of a paragraph. Also part of this state is a stack of open tags.

To make paging easier, it's smart to keep this "page boundary" state for each page you've encountered so far. This makes paging back easy.

Now, when rendering a new page, the previous page boundary state will give you the initial rendering state. You simply read HTML and render it element by element until you overflow a single page. You then backtrack a bit and determine the new page boundary state.

Smooth scrolling is basically a matter of rendering two adjacent pages and showing x% of the first and 100-x% of the second. Once you've implemented this bit, it may become smart to finish a paragraph when rendering each page. This will give you slightly different page lengths, but you don't have to deal with broken paragraphs, and that in turn makes your page boundary state a bit smaller.

MSalters