Question
What is the fastest way to dump a large (> 1GB) XML file into a MySQL database?
Data
The data in question is the StackOverflow Creative Commons Data Dump.
Purpose
This will be used in an offline StackOverflow viewer I am building, since I am looking to do some studying/coding in places where I will not have access to the internet.
I would like to release this to the rest of the StackOverflow membership for their own use when the project is finished.
Problem
Originally, I was reading from XML/writing to DB one record at a time. This took about 10 hours to run on my machine. The hacktastic code I'm using now throws 500 records into an array, then creates an insertion query to load all 500 at once (eg. "INSERT INTO posts VALUES (...), (...), (...) ... ;
"). While this is faster, it still takes hours to run. Clearly this is not the best way to go about it, so I'm hoping the big brains on this site will know of a better way.
Constraints
- I am building the application using C# as a desktop application (i.e. WinForms).
- I am using MySQL 5.1 as my database. This means that features such as "
LOAD XML INFILE filename.xml
" are not usable in this project, as this feature is only available in MySQL 5.4 and above. This constraint is largely due to my hope that the project would be useful to people other than myself, and I'd rather not force people to use Beta versions of MySQL. - I'd like the data load to be built into my application (i.e. no instructions to "Load the dump into MySQL using 'foo' before running this application.").
- I'm using MySQL Connector/Net, so anything in the
MySql.Data
namespace is acceptable.
Thanks for any pointers you can provide!
Ideas so far
stored procedure that loads an entire XML file into a column, then parses it using XPath
- This didn't work since the file size is subject to the limitations of the max_allowed_packet variable, which is set to 1 MB by default. This is far below the size of the data dump files.