views:

38

answers:

2

I created a set of partitioned tables in Postgres, and started inserting a lot of rows via the master table. When the load process blew up on me, I realized I should have declared the id row BIGSERIAL (BIGINT with a sequence, behind the scenes), but inadvertently set it as SERIAL (INTEGER). Now that I have a couple of billion rows loaded, I am trying to ALTER the column to BIGINT. The process seems to be working, but is taking a long time. So, in reality, I don't really know if it is working or it is hung. I'd rather not restart the entire load process again.

Any suggestions?

A: 

Restart it (clarifying edit: restart the entire load process again).

Altering a column value requires a new row version, and all indexes pointing to the old version to be updated to point to the new version.

Additionally, see how much of the advise on populating databases you can follow.


Correction from @archnid:

altering the type of the column will trigger a table rewrite, so the row versioning isn't a big problem, but it will still take lots of disk space temporarily. you can usually monitor progress by looking at which files in the database directory are being appended to...

Stephen Denne
ok. I am not clear from your answer whether you suggesting I restart the server, or I redo the entire data loading process. Are you suggesting that I reload the database, since ALTERing the master table will take about the same amount of time anyway?
punkish
I should have also added... I don't have any indexes on the table, not even a primary key.
punkish
I'm suggesting you redo the entire loading process, since altering the table will take a whole lot longer. However having no indexes at all does remove one of the biggest problems with massively bulk updates. The remaining problem is that you'll need twice the disk space, then when finished, half your table will be empty. I don't have recent experience with large *inherited* tables, so don't know how that impacts the decision.
Stephen Denne
altering the type of the column will trigger a table rewrite, so the row versioning isn't a big problem, but it will still take lots of disk space temporarily. you can usually monitor progress by looking at which files in the database directory are being appended to...
araqnid
@araqnid thanks for the correction.
Stephen Denne
+1  A: 

When you update a row to alter it in PostgreSQL, that writes out a new copy of the row and then does some cleanup later to remove the original. This means that trying to fix the problem by doing updates can take longer than just loading all the data in from scratch again--it's more disk I/O than loading a new copy, and some extra processing time too. The only situation where you'd want to do an update instead of a reload is when the original load was very inefficient, for example if a slow client programs is inserting the data and it's the bottleneck on the process.

To figure out if the process is still working, see if it's using CPU when you run top (UNIX-ish systems) or the Task Manager (Windows). On Linux, "top -c" will even show you what the PostgreSQL client processes are doing. You probably just expected it to take less time than the original load, which it won't, and it's still running rather than hung up.

Greg Smith