I have a legacy binary file format containing records that we have to convert and serve to other parts of our system as XML. To give a sense of the data sizes, a single file may be up to 50 megs with 50,000 or more records in it. The XML conversion I have to work with blows this particular file up by a factor of 20 to nearly a gig.
(Unsuprisingly) compressing the file with gzip makes the file ~150 Mb so there is a lot of redundancy.
But what we have to serve out as XML is the individual records that are part of the larger file. Each of these records is quite small. Random access to the records is a requirement. The records themselves contain a variety of different fields so there is no mapping of elements to columns without having a very large table.
As other parts of the system utilize a postgresql database, we are considering storing each of the individual XML nodes as a row in the database. But we are wondering how inefficient this would be storage wise?
<xml>
<record><complex_other_xml_nodes>...<record>
<record>...<record>
<record>...<record>
<record>...<record>
<record>...<record>
</xml>
Or should we be off evaluating an XML database (or something else)? Oh, and we don't need to update or change the XML after conversion, these legacy records are static.