tags:

views:

234

answers:

5

I am writing a basic word processing application, and I am trying to settle on a native "internal" format, the one that my code parsers in order to render to the screen. I'd like this to be XML, so that I can, in the future, just write XSLT to convert it to ODF or XHTML or whatever.

However, when searching for existing standards to use, the only one that looks promising is ODF. But that looks like massive overkill for what I'm thinking of. All I need is paragraph tags, font selection, font size & decoration... that's pretty much it. It would take me a long time to implement even a minimal ODF renderer and I'm not sure it's worth the trouble.

Right now I'm thinking of making my own XML format, but that's not really good practice. Better to use a standard, especially since then I can probably find the XSLTs I might need in the future already written.

Or should I just bite the bullet and implement ODF?

EDIT: Regarding the Answer

I had known about XSL-FO before, but due to the weight of the spec hadn't really consdiered it. But you're right, a subset would give me everything I need to work with and room to grow... thanks so much the reminder.

Plus, by including a rendering library like FOP or RenderX, I get PDF generation for free. Not bad...

A: 

XML is an external format, not internal.

What's wrong with XHTML? It's simple and it's ubiquitous (at least HTML is). Your implementation would be easy to debug, and your users will be eternally greatful.

Frank Krueger
A: 

Well, right... But since I need to be able to convert to XML anyway, why hold both my document tree and the DOM tree in memory, when there's nothing preventing me from working right off the DOM tree?

Particularly since one unique feature of my program is that everything is always saved as you type, and I don't want to run a whole conversion to XML every time I hit a key. Easier just to tie input and output directly to my in-memory DOM tree.

Edit: Oh, and the only problem with XHTML is that I do want to support basic pagination. Though I guess there's nothing stopping me with using some additional tags for that...

levand
A: 

If its only for word processing, then perhaps DocBook might be a little lighter than ODF?

However, the wiki entry states:

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation.

So it might not be so suitable for a general-purpose word-processor?

The advantage of using DocBook would be the fact that a number of DocBook -> other format converters should be available? Hope this helps.

toolkit
A: 

I like DocBook, but it doesn't really fit. It strives to be presentation-independent, the intention being that you would use XSLT to render it to a presentation format.

In a word processor, the user is editing presentation along with the content. For example, the user doesn't want to mark a "keyword", necessarily, they want to make some text bold.

A DocBook editor would be a very nice thing (I'm not sure a good one exists), but it's not really what I'm doing.

levand
+3  A: 

As you are sure about needing to represent the presentational side of things, it may be worth looking at the XSL-FO W3C Recommendation. This is a full-blown page description language and the (deeply unfashionable) other half of the better-known XSLT.

Clearly the whole thing is anything but "lightwight", but if you just incorporated a very limited subset - which could even just be (to match your spec of "paragraph tags, font selection, font size & decoration") fo:block and the common font properties, something like:

<yourcontainer xmlns:fo="http://www.w3.org/1999/XSL/Format"&gt;
    <fo:block font-family="Arial, sans-serif" font-weight="bold"
        font-size="16pt">Example Heading</fo:block>
    <fo:block font-family="Times, serif"
        font-size="12pt">Paragraph text here etc etc...</fo:block>
</yourcontainer>

This would perhaps have a few advantages over just rolling your own. There's an open specification to work from, and all that implies. It reuses CSS properties as XML attributes (in a similar manner to SVG), so many of the formatting details will seem somewhat familiar. You'd have an upgrade path if you later decided that, say, intelligent paging was a must-have feature - including more sections of the spec as they become relevant to your application.

There's one other thing you might get from investigating XSL-FO - seeing how even just-doing-paragraphs-and-fonts can be horrendously complicated. Trying to do text layout and line breaking 'The Right Way' for various different languages and use cases seems very daunting to me.

gz