tags:

views:

1747

answers:

5

Rather than rewriting the entire contents of an xml file when a single element is updated, is there a better alternative to updating the file?

+3  A: 

I would recommend using VTD-XML http://vtd-xml.sourceforge.net/

From their FAQ ( http://vtd-xml.sourceforge.net/faq.html ):

Why should I use VTD-XML for large XML files?

For numerous reasons summarized below:

  • Performance: The performance of VTD-XML is far better than SAX
  • Ease to use: Random access combined with XPath makes application easy to write
  • Better maintainability: App code is shorter and simpler to understand.
  • Incremental update: Occasional, small changes become very efficient.
  • Indexing: Pre-parsed form of XML will further boost processing performance.
  • Other features: Cut, paste, split and assemble XML documents is only possible with VTD-XML.

In order to take advantage of VTD-XML, we recommended that developers split their ultra large XML documents into smaller, more manageable chucks (<2GB).

Peter
A: 

You have a few options here, but none of them are good.

Since XML Objects aren't broken into distinct parts, you'll either have to use some filesystem level modification with regex pattern matching (sed is a good start), OR you should break your xml into smaller parts for manageability.

A: 

If possible, serialize the XML and use diff/patch/apply Linux tools (or equivalent tools in your platform) . This way, you don't have to deal with parsing, writing.

+2  A: 

If your XML file is so large that updating it is a performance bottleneck, you should consider moving away from XML to a more efficient disk format (or a real database).

If, however, you just feel like it might be a problem, remember the rules of optimization:

  1. Don't do it
  2. (experts only) Don't do it, yet.
davetron5000
A: 

Process Large XML Files with XQuery Works with Gigabyte Size XML Files http://www.xquery.com

XQuery is a query language that was designed as a native XML query language. Because most types of data can be represented as XML, XQuery can also be used to query other types of data. For example, XQuery can be used to query relational data using an XML view of a relational database. This is important because many Internet applications need to integrate information from multiple sources, including data found in web messages, relational data, and various XML sources. XQuery was specifically designed for this kind of data integration.

For example, suppose your company is a financial institution that needs to produce reports of stock holdings for each client. A client requests a report with a Simple Object Access Protocol (SOAP) message, which is represented in XML. In most businesses, the stock holdings data is stored in multiple relational databases, such as Oracle, Microsoft SQL Server, or DB2. XQuery can query both the SOAP message and the relational databases, creating a report in XML.

XQuery is based on the structure of XML and leverages that structure to make it possible to perform queries on any type of data that can be represented as XML, including relational data. In addition, XQuery API for Java (XQJ) lets your queries run in any environment that supports the J2EE platform.

Siddharth Gaur