views:

290

answers:

1

I would like to grab the entire tree for a Writer document in OpenOffice 3.1. I need to collect data on all the elements in the tree, not only the Text elements.

By loading the XTextDocument and doing getText() will give the XText element. More specifically, using an XEnumerationAccess from the XText will only iterate over the TextRange.

From the OpenOffice documentation /DevGuide/Text/Iterating_over_Text:

The second interface of com.sun.star.text.Text is XEnumerationAccess. A Text service enumerates all paragraphs in a text and returns objects which support com.sun.star.text.Paragraph. This includes tables, because writer sees tables as specialized paragraphs that support the com.sun.star.text.TextTable service.

Some additional documentation here:

The text portion enumeration of a paragraph does not supply contents which do belong to the paragraph, but do not fuse together with the text flow. These could be text frames, graphic objects, embedded objects or drawing shapes anchored at the paragraph, characters or as character. The TextPortionType "TextContent" indicate if there is a content anchored at a character or as a character. If you have a TextContent portion type, you know that there are shape objects anchored at a character or as a character.

My test documents indicate that I do get a XTextContent and the XTextRange can be collected via getAnchor(). But how can I determine the type of content that I am collecting? The only method is getString(). If the object was an embedded image, how do I collect its data?

I am using C++ but I believe a solution in Java would be portable.


Migrated From Answer

Due to poor formatting, this comment is posted as an answer.

Thanks for your response.

I intend to use the API.

I am trying the example of collecting GrahicObjects from the document. By using a XGraphicObjectsSupplier I can get a collection via getGraphicObjects(). The object from the collection is Any and printing the type via getValueTypeName() gives XTextContent.

The API describes that the collection holds a TextGraphicObject "service". How do I grab an instance of it?

A: 

Answers for your question would be complicated but I'll try to make myself understandable.

  • Exporting the document to XML would be easier to process using SAX. If using the XML way, you would have to implement XDocumentHandler and read the document(optionally filter what you don't need). The rest of the work would be either XSLT transformations or SAX for big documents.

  • If you prefer using only the API, you'll have to play a lot with XServiceInfo and UnoRuntime.queryInterface

John Doe