views:

348

answers:

3

Before I go about doing this myself, I wonder, has anyone written a script or program that can efficiently and quickly load up the the SO data into SQL Server.

I tried the script here but it completely took out my machine.

Im thinking either some c# script that streams through the xml file forward only and uses bulk inserter should do the trick.

Otherwise I'm considering just converting the sqlite dump _nobody created to sql server should be fairly fast.

Any other ideas or scripts to do this out there? Perhaps a compressed sql server backup torrent?

Update I created a very fast sql server importer which lives on github.

Related What interesting stats can I obtain from the Stack Overflow data-dump?

+2  A: 

I've had some good results importing into MySQL by writing a program that converts the files to CSV, and then doing a bulk insert from the CSV directly into MySQL. I used the XMLTextReader in .Net to read the xml files. Converting to CSV is pretty simple. Just read each row, and output the attributes enclosed in ", And replace any " with "". Writing the posts.xml to csv only took a couple minutes, and similarly for importing it. I used LOAD DATA INFILE to load the Data into MySQL. You can probably get similar results writing the CSV file, and importing it using BULK INSERT.

Just some numbers to give you an idea of how long it should take to import. Converting the posts.xml (bulk of the data) to CSV can be done in under 2 minutes. Importing the resulting CSV into MySQL takes about 5 minutes. This is running an AMD Dual Core (4200?) with 2 GB of RAM and a 7200 RPM SATA HD on Windows XP Pro.

Kibbee
OK I wrote a tiny import winform app, im able to get at all the data in under 5 minutes. Pretty happy with that. I get the added bonus that the thing does not take out my machine ... :)
Sam Saffron
+5  A: 

OK I just wrote a tool that imports the full SO dump into SQL Server in under 5 minutes.

Feel free to amend and extend it (just go ahead and fork it on github) its using a BSD license.

http://github.com/samsaffron/So-Slow/tree/master

Sam Saffron
Can you add a compiled version? Us DBAs don't have the tools to compile it, unfortunately.
Brent Ozar
would this work with mysql as well or is it different?
jasondavis
For downloads look at: http://github.com/SamSaffron/So-Slow/downloads
Sam Saffron
Note it will be tricky to adapt this solution to mysql cause it relies on SqlBulkCopy which is not available on MySql
Sam Saffron
Great Tool!When I ran the tool from source code, I had an issue with some posts having no tags and the whole import throwing a SQLNullException. I changed the statement "select Id, Tags from Posts where PostTypeId = 1" into "select Id, Tags from Posts where PostTypeId = 1 AND Tags IS NOT NULL" which seems to work fine now.
Jurgen
@Jargen, yerp good point, I need to update that, it happened due to the new post type wiki
Sam Saffron