We have an architecture where we use SSIS to extract data from XML batch files into a staging database for validation, prior to exporting it into production.
We control the XML format to some extent, and I've been asked to determine what the maximum number of records the XML batch file should contain. Based on the XML schema and some sample data, I can estimate the average record size and do some projections from there.
However, coming at it from the other angle, I'd like to get an indication of the technical limitations of SSIS when dealing with large XML files.
I'm aware that SSIS will flatten and transform the XML document into its own tabular, in-memory representation, so RAM becomes an obvious limiting factor but in what proportion?
Can you say something like, SSIS requires something like at least 2.5 times the size of the file you're trying to load, in available memory? Assuming that I have a 32GB box dedicated to this data-loading function, how large can my XML files be?
I'm aware that there might other factors included, such as the complexity of the schema, number of nested elements, etc. but it'd be nice to have a starting point.