views:

203

answers:

2

I've got multiple massive (multi gigabyte) datasets I need to import into a Rails app. The datasets are currently each in their own database on my development machine, and I need to read from them and create rows in tables in my Rails database based on the information they contain. The tables in my Rails database will not be exactly the same as the tables in the source databases.

What's the smartest way to go about this?

I was thinking migrations, but I'm not exactly sure how to connect the migration to the databases, and even if that is possible, is that going to be ridiculously slow?

+1  A: 

without seeing the schemas or knowing the logic you want to apply to each row, I would say the fastest way to import this data is to create a view of the table you want to export in the column order you want (and process it using sql) and the do a select into outfile on that view. You can then take the resulting file and import it into the target db.

This will not allow you to use any rails model validations on the imported data, though.

Otherwise, you have to go the slow way and create a model for each source db/table to extract the data (http://programmerassist.com/article/302 tells you how to connect to a different db for a given model) and import it that way. This is going to be quite slow, but you could set up an EC2 monster instance and run it as fast as possible.

Migrations would work for this, but I wouldn't recommend it for something like this.

Joshua Smith
I'm pretty weak on the database-fu, in an approach like this would I be able to retain the relations between models in the original dataset? It seems like if I'm importing foreign_keys, I'd have to also import the primary keys in order to retain the relationships, and in order to retain the primary keys, I'd need to be importing the data into a blank table. I could conceivably dump my existing tables into a file and reimport them afterwards, manually matching up their foreign_key relationships, but that obviously sounds like a huge pain. Or is there something big I'm missing?
WIlliam Jones
You would likely need to import them into a blank table if you are using the dump/reimport method. If you created a model for the data and imported via rails models you would not (because you could use the associations to create and manage the related items).
Joshua Smith
A: 

Since georgian suggested it, I'll post my comment as an answer:

If the changes are superficial (column names changed, columns removed, etc), then I would just manually export them from the old database and into the new, and then run a migration to change the columns.

Jordan