views:

252

answers:

4

If I have 50,000-100,000 product skus with accompanying information, including specifications and descriptions, that needs to be updated on a regular basis (at least once a day), is XML the best way to go as a data interchange format? The application is written in PHP, and I'm thinking SimpleXML to PHP's native MySQL calls (as opposed to using application hooks to dump data into the appropriate location in the DB). The server will be Linux-based, and I will have full root access. I know this is a rather generic question, which is why I made it community wiki -- I'm looking for an overall approach that is considered best practice. If it matters the application is Magento.

+1  A: 

The only real down side to XML is that it is very verbose. XML files are generally very large compared to other formats. The upside is that it is relatively easy to read (for people) and parse (for software). With only 100K records (without knowing the size of each record) I think I would go with XML.

Jim Blizard
The other significant downside to XML is parsing time. A binary file doesn't need to be fully parsed, and so it can be very quickly mapped into memory. However, this can get tricky when interchanging data, so here, XML is probably appropriate.
McPherrinM
A: 

I currently use XML as an import format on an e-commerce project. It currently has over 10,000 products, attributes and descriptions and and will iterate over the data pretty quickly. I don't have any other choice in this matter, though.

Using SOAP would be a viable alternative to just receiving the raw XML (although, I think this would add to the performance cost, as SOAP uses XML as it's messaging format anyway), however, you can get your data as a native PHP type, such as an array which you could pass directly to your DAL for inserting to the database, side stepping the need for constructing a SimpleXML object.

Kieran Hall
When you say "as a native PHP type", what do you mean? If I have Server A (has the raw data) that makes a call to Server B (has the application instance), what would it send across?
hal10001
Well, the SOAP server (in it's WSDL (if using one)), can specify the 'type' as 'type="xsd:struct"' which would mean that your PHP SOAP client should interpret the response from that soap function call as an array. There would be no need to create an SimpleXML object because your client would have already returned an array for you.
Kieran Hall
+4  A: 

You have to define the parameters of "best" for your given scenario.

XML is verbose, which means two things

  • You can supply a lot of detail about the data, including metadata
  • Filesize is going to be big

The other advantage you gain with XML is more advanced parsing/selection "out-of-the-box" with tools like XPath.

But there are many other formats you could choose, each with their own advantage and disadvange

And several others.

My point is, that you need to figure out what's important to your system (speed? character-set support? human-readability?) and choose a format that's going to be compatible for both sides.

Peter Bailey
+1  A: 

JSON takes a lot less space than XML, although XML compress very well. XML has also the advantage of a lot of mature libraries and tools.

If you exchange data with 3rd party sources you might want to validate there XML with a Schema. You don't have that for JSON.

Personally I end up using XML most of the time. If space is an issue I apply gzip compression to the XML data.

Kimble