If I have 50,000-100,000 product skus with accompanying information, including specifications and descriptions, that needs to be updated on a regular basis (at least once a day), is XML the best way to go as a data interchange format? The application is written in PHP, and I'm thinking SimpleXML to PHP's native MySQL calls (as opposed to using application hooks to dump data into the appropriate location in the DB). The server will be Linux-based, and I will have full root access. I know this is a rather generic question, which is why I made it community wiki -- I'm looking for an overall approach that is considered best practice. If it matters the application is Magento.
views:
252answers:
4The only real down side to XML is that it is very verbose. XML files are generally very large compared to other formats. The upside is that it is relatively easy to read (for people) and parse (for software). With only 100K records (without knowing the size of each record) I think I would go with XML.
I currently use XML as an import format on an e-commerce project. It currently has over 10,000 products, attributes and descriptions and and will iterate over the data pretty quickly. I don't have any other choice in this matter, though.
Using SOAP would be a viable alternative to just receiving the raw XML (although, I think this would add to the performance cost, as SOAP uses XML as it's messaging format anyway), however, you can get your data as a native PHP type, such as an array which you could pass directly to your DAL for inserting to the database, side stepping the need for constructing a SimpleXML object.
You have to define the parameters of "best" for your given scenario.
XML is verbose, which means two things
- You can supply a lot of detail about the data, including metadata
- Filesize is going to be big
The other advantage you gain with XML is more advanced parsing/selection "out-of-the-box" with tools like XPath.
But there are many other formats you could choose, each with their own advantage and disadvange
And several others.
My point is, that you need to figure out what's important to your system (speed? character-set support? human-readability?) and choose a format that's going to be compatible for both sides.
JSON takes a lot less space than XML, although XML compress very well. XML has also the advantage of a lot of mature libraries and tools.
If you exchange data with 3rd party sources you might want to validate there XML with a Schema. You don't have that for JSON.
Personally I end up using XML most of the time. If space is an issue I apply gzip compression to the XML data.