Grab entire document tree with OpenOffice API

I would like to grab the entire tree for a Writer document in OpenOffice 3.1. I need to collect data on all the elements in the tree, not only the Text elements.

By loading the XTextDocument and doing getText() will give the XText element. More specifically, using an XEnumerationAccess from the XText will only iterate over the TextRange.

From the OpenOffice documentation /DevGuide/Text/Iterating_over_Text:

The second interface of com.sun.star.text.Text is XEnumerationAccess. A Text service enumerates all paragraphs in a text and returns objects which support com.sun.star.text.Paragraph. This includes tables, because writer sees tables as specialized paragraphs that support the com.sun.star.text.TextTable service.

Some additional documentation here:

The text portion enumeration of a paragraph does not supply contents which do belong to the paragraph, but do not fuse together with the text flow. These could be text frames, graphic objects, embedded objects or drawing shapes anchored at the paragraph, characters or as character. The TextPortionType "TextContent" indicate if there is a content anchored at a character or as a character. If you have a TextContent portion type, you know that there are shape objects anchored at a character or as a character.

My test documents indicate that I do get a XTextContent and the XTextRange can be collected via getAnchor(). But how can I determine the type of content that I am collecting? The only method is getString(). If the object was an embedded image, how do I collect its data?

I am using C++ but I believe a solution in Java would be portable.

Migrated From Answer

Due to poor formatting, this comment is posted as an answer.

Thanks for your response.

I intend to use the API.

I am trying the example of collecting GrahicObjects from the document. By using a XGraphicObjectsSupplier I can get a collection via getGraphicObjects(). The object from the collection is Any and printing the type via getValueTypeName() gives XTextContent.

The API describes that the collection holds a TextGraphicObject "service". How do I grab an instance of it?

ansaurus

tags:

views:

answers:

Grab entire document tree with OpenOffice API

related questions