views:

651

answers:

8

We're migrating MySQL to PostgreSQL. I can easily audit the schema and the SQL statements used throughout the (REALbasic) program. Most of the SQL is composed by building string variables.

I already know about needing to replace our use of SELECT LAST_INSERT_ID() with a SERIAL column with UNIQUE constraint.

What, if any, differences between the two which are not obviously visible in SQL statements might bite us? I'm looking for (probably subtle) assumptions about behaviour such as any differences in autocommit, need to add constraints which aren't in MySQL etc.

I'm trying to tease out any gotchas for a couple of reasonably smart, attentive guys who aren't gurus in either of the databases.

This is a one-way commitment so if there are major benefits we will get by adding new declarations I'd appreciate them pointing out.

Note: we're not using any form of parameterised queries and, yes, I've pointed out issues with injection attacks as a required audit of the code.

Yes, for the curious, this decision was prompted by GPL issues, not that we're averse to paying for licenses but, unfortunately, the sole REALbasic driver for MySQL was GPL. As of May 2009, Real Software have released a new Community driver which is GPL, and properly includes source. They have promised a non-GPL Enterprise driver in the near future.

I am prepared to believe that the answer might be there are no invisible monsters under the bed but thought I'd ask to be sure.

A: 

Depending on the amount of queries involved (and if you have someone inhouse to do it), it might be wise to extract all the queries, adapt them as necessary and run them on two copies of the existing DB, one running mysql and one running posgresl. Looking at the produced logs and comparing resulting datas might show some interesting hints. The first step could also be done by executing unit tests, manual or script based testing of the application.

MatthieuP
+1  A: 

Unless you've been SQL92-compliant with your column-typing, you will run into differences in the names of types between the two.

In-query variable modification doesn't work in postgreSQL, for instance, this will work in MySQL but not in postgreSQL (I haven't tested this recently, maybe it does work now.)

SET @a:=1 SELECT ID,@a:=@a+1 FROM some_table;

I also personally believe that postgreSQL has better handling of complex queries that include sub-selects and the like (which most MySQL users used to avoid).

Edit: Oh, and I nearly forgot! The way postgreSQL stores tables is utterly different than how MySQL does. This might affect your backup/restore strategies, too.

scraimer
+6  A: 
  • select count(*) from table;

    will be slow, as it needs to read entire table. It needs workarounds if you need to count big tables often. This is needed to ensure multiversion concurrency control.

  • In the latest version (8.3) there's no implicit cast to text, which means that for example

    select 234 like '2%';

    will throw error. You'll need explicit cast like:

    select 234::text like '2%';

  • Update is really a delete+insert. As space used by deleted rows is not immediately freed then if you update entire table in one transaction then you'll need double the space.

Postgresql is a very good database, you'll love it in no time. It has several very useful features that you'll then miss in other, even commercial databases. For example transactional data definition language or savepoints.

Tometzky
thanks, that was EXACTLY the kind of behavioural gotcha I was worried about.
Andy Dent
A: 

Based on the WikiVS comparison I just found a number of interesting points, most not really gotchas, such as: - subqueries are a lot faster in Postgres, this isn't really a gotcha but means some workarounds can be removed

That site did lead me to the Postgres Gotchas list, which had more details on count(*) speed and another sequential scan issue: Max and Min are sequential scans. This is much more likely to hurt our performance than anything else identified so far. However, that article includes a Max(col) workaround:

SELECT col FROM table ORDER BY col DESC LIMIT 1
Andy Dent
Take a note about notes like "Affects: PostgreSQL <= 8.0. Max/min gotcha affects very old versions of PostgreSQL, it will not affect you today. Most of this gotchas on the page are outdated. That's why I did not just send you there.
Tometzky
OK, thanks, yes I see now that was fixed in 8.1 http://www.postgresql.org/docs/8.1/interactive/release-8-1.html
Andy Dent
+3  A: 

When I made the migration from MySQL to PostgreSQL, several things really got in my way:

1) The code writing to the MySQL database was broken and feeding the database garbage a real database that had foreign keys would have prevented. Be prepared to find "surprise" garbage data once you add in referential integrity.

2) MySQL's indexes on strings are case insensitive! If you have a primary key on something like a username, "Coryking" and "CORYKING" are the same according to MySQL. On PostgreSQL, they are different. I wasn't aware of this until there started to be people who were registering duplicate usernames that should already be in the database.

3) MySQL likes to automagically add nonsense default values to columns you specify as "NOT NULL". For example, if you specify a VARCHAR(255) NOT NULL, it will turn that column definition into "VARCHAR(255) NOT NULL DEFAULT ''".

4) PostgreSQL likes huge queries--MySQL doesn't. You'll have a lot of fun after the migration improving your database queries--don't be shy about it either.

Cory R. King
2) is true in MySQL if you happen to be using a case-insensitive (ci) collation for that column. MySQL also has some case-sensitive (cs) collations.
thomasrutter
8.4 will include the "citext" type (case-insensitive text type) as a contrib module, which is also available separately for 8.3: http://justatheory.com/computers/databases/postgresql/citext-patch-submitted.html
araqnid
+1  A: 

I don't know if you use PHP or not but I've found that addslashes works relatively well for MySQL but bombs fast in Postgres. Either use pg_escape_string() or better yet a prepared statement.

gradbot
In theory you should not use addslashes() for either, though it only matters when you are using a non-ascii-compatible encoding. Any ISO, ANSI, ASCII or UTF-8 is safe to use addslashes() in MySQL, but not something like UTF-16 or another multi-byte.
thomasrutter
thanks for the PHP tip but as I said in the first paragraph, we're using REALbasic.
Andy Dent
A: 

Try github.com/maxlapshin/mysql2postgres Perhaps, it will help you to convert data.

Max Lapshin