In my application, I need to store and transmit data that contains many repeating string values (think entity names in an XML document). I have two proposed solutions:
- A) create a string table to be stored along the document, and then use index references (using multi-byte encoding) in the document body, or
- B) simply compress the document using gzip or a similar compression algorithm.
Which one is likely going to perform better in terms of speed and data size? (Obviously, this depends on the quality of the implementations, but assume that option A builds an array of strings dynamically and encodes the document body in some reasonable fashion).
Also, if option B, do you recommend a more potentially suitable compression method other than gzip?