views:

25

answers:

2

I'm working on a pretty large data migration project which involves moving data between servers and different active record schemas. The whole migration literally takes days.

How can I implement a caching solution where my Ruby migration code when it calls the Active Record save method writes to a cache (which then updates the MySQL database asynchronously). Memory is not a limitation.

All of the caching solutions in Rails seem to deal with loading/query reads well but writes is something I haven't been able to find much on.

Is this something that is easily doable in MySQL by tuning config params? Or is there a caching solution for Ruby/Rails that exists?

I've looked at Delayed Job though it's not a caching layer (and it's not apparent that anyone has used it for lazy writes to the database). I've looked at MEMORY storage tables in MySQL but of course, they are not written to disk. Memcached isn't built for this.

Please advise!

A: 

if you just want to migrate the data, why don't you use stored procedures directly from the database? i'm sure is more efficient than using the ActiveRecord Layer

RDAM
The schema changes are pretty nasty and involved. I can't do this at the SQL level purely.
blizkreeg
+1  A: 

What you mean is, how can I quickly insert data without waiting for MySQL to update all it's indices so my database migration won't take ages?

There are two solutions to speed up database INSERTs, but they have serious drawbacks as well:

I've found INSERT DELAYED to be of help. It fires your query and immediately returns. MySQL queues up several of those queries for a few seconds, and applies them when there's nothing else to do. This is great for logging and such.

The drawback is: if your db goes down, the queue of INSERT statements is lost.

Another option is this:

  • Create the new database schema - without indices!
  • Migrate your data, INSERT away. This is faster because MySQL won't have to update your indices while INSERTing.
  • When all data is migrate, add your indices.

The drawback here is that your data may not be consistent. e.g. your source db has duplicate values that you don't want in your target db.

A third option I would highly recommend is using pure SQL instead of ActiveRecord. If you need to migration 1 million records, ActiveRecord has to allocate 1 million times an instance of your ActiveRecord model, perform the query, and then dispose of the object again.

The creating and disposing of those objects doesn't take much time, but if you do it 1 million times, it adds up.

If you're unlucky you'll run into a memory leak and old object won't be released from memory. This will consume a lot of memory and might eventually crash your ruby process.

Cut the middleman and use raw SQL where possible. Using ruby to perform the SQL is no problem, but keep it as simple as possible.

Ariejan