views:

407

answers:

3

A long time ago in a galaxy far, far away...

Trying to migrate a database from MySQL to PostgreSQL. All the documentation I have read covers, in great detail, how to migrate the structure. I have found very little documentation on migrating the data. The schema has 13 tables (which have been migrated successfully) and 9 GB of data.

MySQL version: 5.1.x
PostgreSQL version: 8.4.x

I want to use the R programming language to analyze the data using SQL select statements; PostgreSQL has PL/R, but MySQL has nothing (as far as I can tell).

A New Hope

Create the database location (/var has insufficient space; also dislike having the PostgreSQL version number everywhere -- upgrading would break scripts!):

  1. sudo mkdir -p /home/postgres/main
  2. sudo cp -Rp /var/lib/postgresql/8.4/main /home/postgres
  3. sudo chown -R postgres.postgres /home/postgres
  4. sudo chmod -R 700 /home/postgres
  5. sudo usermod -d /home/postgres/ postgres

All good to here. Next, restart the server and configure the database using these installation instructions:

  1. sudo apt-get install postgresql pgadmin3
  2. sudo /etc/init.d/postgresql-8.4 stop
  3. sudo vi /etc/postgresql/8.4/main/postgresql.conf
  4. Change data_directory to /home/postgres/main
  5. sudo /etc/init.d/postgresql-8.4 start
  6. sudo -u postgres psql postgres
  7. \password postgres
  8. sudo -u postgres createdb climate
  9. pgadmin3

Use pgadmin3 to configure the database and create a schema.

The episode continues in a remote shell known as bash, with both databases running, and the installation of a set of tools with a rather unusual logo: SQL Fairy.

  1. perl Makefile.PL
  2. sudo make install
  3. sudo apt-get install perl-doc (strangely, it is not called perldoc)
  4. perldoc SQL::Translator::Manual

Extract a PostgreSQL-friendly DDL and all the MySQL data:

  1. sqlt -f DBI --dsn dbi:mysql:climate --db-user user --db-password password -t PostgreSQL > climate-pg-ddl.sql
  2. Edit climate-pg-ddl.sql and convert the identifiers to lowercase, and insert the schema reference (using VIM):
    • :%s/"\([A-Z_]*\)"/\L\1/g
    • :%s/ TABLE / TABLE climate./g
    • :%s/ on / on climate./g
  3. mysqldump --skip-add-locks --complete-insert --no-create-db --no-create-info --quick --result-file="climate-my.sql" --databases climate --skip-comments -u root -p

It might be worthwhile to simply rename the tables and columns in MySQL to lowercase:

  1. select concat( 'RENAME TABLE climate.', TABLE_NAME, ' to climate.', lower(TABLE_NAME), ';' ) from INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA='climate';
  2. Execute the commands from the previous step.
  3. There is probably a way to do the same for columns; I changed them manually because it was faster than figuring out how to write the query.

The Database Strikes Back

Recreate the structure in PostgreSQL as follows:

  1. pgadmin3 (switch to it)
  2. Click the Execute arbitrary SQL queries icon
  3. Open climate-pg-ddl.sql
  4. Search for TABLE " replace with TABLE climate." (insert the schema name climate)
  5. Search for on " replace with on climate." (insert the schema name climate)
  6. Press F5 to execute

This results in:

Query returned successfully with no result in 122 ms.

Replies of the Jedi

At this point I am stumped.

  • Where do I go from here (what are the steps) to convert climate-my.sql to climate-pg.sql so that they can be executed against PostgreSQL?
  • How to I make sure the indexes are copied over correctly (to maintain referential integrity; I don't have constraints at the moment to ease the transition)?
  • How do I ensure that adding new rows in PostgreSQL will start enumerating from the index of the last row inserted (and not conflict with an existing primary key from the sequence)?
  • How do you ensure the schema name comes through when transforming the data from MySQL to PostgreSQL inserts?

Resources

A fair bit of information was needed to get this far:

Thank you!

+1  A: 

Convert the mysqldump file to a PostgreSQL-friendly format

Convert the data as follows (do not use mysql2pgsql.perl):

  1. Escape the quotes.

    sed "s/\\\'/\'\'/g" climate-my.sql | sed "s/\\\r/\r/g" | sed "s/\\\n/\n/g" > escaped-my.sql

  2. Replace the USE "climate"; with a search path and comment the comments:

    sed "s/USE \"climate\";/SET search_path TO climate;/g" escaped-my.sql | sed "s/^\/\*/--/" > climate-pg.sql

  3. Connect to the database.

    sudo su - postgres
    psql climate

  4. Set the encoding (mysqldump ignores its encoding parameter) and then execute the script.

    \encoding iso-8859-1
    \i climate-pg.sql

This series of steps will probably not work for complex databases with many mixed types. However, it works for integers, varchars, and floats.

Indexes, primary keys, and sequences

Since mysqldump included the primary keys when generating the INSERT statements, they will trump the table's automatic sequence. The sequences for all tables remained 1 upon inspection.

Set the sequence after import

Using the ALTER SEQUENCE command will set them to whatever value is needed.

Schema Prefix

There is no need to prefix tables with the schema name. Use:

SET search_path TO climate;
Dave Jarvis
+2  A: 

What I usually do for such migrations is two-fold:

  • Extract the whole database definition from MySQL and adapt it to PostgreSQL syntax.
  • Go over the database definition and transform it to take advantage of functionality in PostgreSQL that doesn't exist in MySQL.

Then do the conversion, and write a program in whatever language you are most comfortable with that accomplishes the following:

  • Reads the data from the MySQL database.
  • Performs whatever transformation is necessary on the data to be stored in the PostgreSQL database.
  • Saves the now-transformed data in the PostgreSQL database.

Redesign the tables for PostgreSQL to take advantage of its features.

If you just do something like use a sed script to convert the SQL dump from one format to the next, all you are doing is putting a MySQL database in a PostgreSQL server. You can do that, and there will still be some benefit from doing so, but if you're going to migrate, migrate fully.

It will involve a little bit more up-front time spent, but I have yet to come across a situation where it isn't worth it.

Michael Trausch
@Michael: "Performs whatever transformation is necessary on the data to be stored in the PostgreSQL database." That's the part for which I was seeking an answer. :-)
Dave Jarvis
Well, that's the part that nobody can answer: only you can, after you have redesigned the schema. There is no generic means of doing it effectively which I am aware of. You read your MySQL tables using the MySQL API libraries, and if you keep the tables exactly the same in pgsql, just write them as-is. Or, if you alter the table/database structure to fit pg's capabilities, you transform it in memory and then write it. But that's a data-specific problem. You would do the same thing if you were going from MySQL to SQLite (which has fewer data types and less functionality, for example).
Michael Trausch
@Michael: I was looking to fix the `mysqldump` file so that it imports cleanly into PostgreSQL. There were a number of technical errors and gotchas as I have shown in my answer. Would these not be the same problems for anyone using `mysqldump`?
Dave Jarvis
@Michael: You were right about restructuring the database. The way the data was structured worked well with MySQL, but PostgreSQL needs a different tact. I should not have split out the DAY/MONTH/YEAR into separate tables.
Dave Jarvis
Anytime I migrate data from one RDBMS to the next (assuming I am familiar with both) I don't convert the dump to be compatible with the target. In fact, I *never* take a dump that way, except to load it into an idle server. Changing database servers is (rightly!) a fairly big deal, especially for applications which have complex data.
Michael Trausch
+2  A: 

If you've converted a schema then migrating data would be the easy part:

  • dump schema from PostgreSQL (you said that you've converted schema to postgres, so we will dump it for now, as we will be deleting and recreating target database, to have it cleaned):
   pg_dump dbname > /tmp/dbname-schema.sql
   
  • split schema to 2 parts — /tmp/dbname-schema-1.sql containing create table statements, /tmp/dbname-schema-2.sql — the rest. PostgreSQL needs to import data before foreign keys, triggers etc. are imported, but after table definitions are imported.

  • recreate database with only 1 part of schema:

   drop database dbname
   create database dbname
   \i /tmp/dbname-schema-1.sql
   -- now we have tables without data, triggers, foreign keys etc.
   
  • import data:
   (
       echo 'start transaction';
       mysqldump --skip-quote-names dbname | grep ^INSERT;
       echo 'commit'
   ) | psql dbname
   -- now we have tables with data, but without triggers, foreign keys etc.
   

A --skip-quote-names option is added in MySQL 5.1.3, so if you have older version, then install newer mysql temporarily in /tmp/mysql (configure --prefix=/tmp/mysql && make install should do) and use /tmp/mysql/bin/mysqldump.

  • import the rest of schema:
   psql dbname
   start transaction
   \i /tmp/dbname-schema-2.sql
   commit
   -- we're done
   
Tometzky
I don't understand the first point: dumping PostgreSQL schema? I didn't have a PostgreSQL schema. I had a MySQL schema. :-) Also, have you actually tried the data dump? I didn't see the skip-quote-names option in the mysqldump! That would have helped a great deal. Also, what about backticks versus single qoutes?
Dave Jarvis
I've added some clarifications for you.
Tometzky