tags:

views:

404

answers:

3

We're upgrading a large site, opendemocracy.net, from Drupal 4.7 to Drupal 6. Our existing install has a lot of superfluous tables in the database from once-used modules we won't be using: ideally, we wouldn't keep these, or our old blocks, etc. This is because: (a) it'd be nice to keep the database as small as possible, (b) it'd be nice for new blocks to start at ID 1, etc., (c) as clean an install as possible should minimise any problems (for example, our crufty old install has some strange problems, e.g. not letting us create any new blocks).

All we really want to keep is our nodes (plus comments), taxonomies, files, statistics, users, path aliases, and one custom table. I've tried copying the tables that I think govern these (listed at the end of this question) into a fresh 4.7 install and then upgrading from this, and it appears to work. My questions are: would this work? Is it a good idea? Is there a better way to achieve some of the same goals?

Also, what if we needed to do the same with another install, also adding its content to the same new install? We have a separate install for our forums. The following approach seemed to work, but I'm worried it will cause problems when creating new content reaches the new nids I've created (how does Drupal check what nids are free; is there a database variable I should change?):

  1. increment the nids (and vids) in the node and term_nodes tables of the forums install by the highest nid (or vid) in the main install
  2. create a new vocabulary in the main install with the same terms as the Forums vocab in the forums install (at maininstall.com/admin/content/forum - I'm not sure how else to specify this is the forum vocabulary)
  3. change the tids in the term_nodes tables of the forums install to the tid values of terms in this new vocabulary in the main install
  4. insert the contents of the modified node and term_nodes tables of the forums install into the node and term_nodes tables of the main install.

PS: Here are the tables I've found which need copying:

audio, audio_attach, audio_image, audio_metadata, comments, files, node, node_access, node_comment_statistics, node_counter, node_revisions, node_type, od_story, term_data, term_hierarchy, term_node, term_relation, term_synonym, url_alias, users, vocabulary, vocabulary_node_types
+1  A: 

If your approach to creating a cleaner base install seems to work then stick with it.

For the second part of the question you may consider the migrate module. It is designed to copy from non drupal cms to drupal, but should be able to help you add drupal content to another drupal site.

Jeremy French
+3  A: 

I don't quite agree with Jeremy on the first part. You have to be very careful to mash around in your Drupal database. Sometimes things may seem to work, but then later, you find that you have created a lot of problems for yourself, because you delete something in one table that's referenced in another. Also id number of your blocks really wont effect Drupals ability to create new blocks etc. If you do want "pretty" ids, make sure you test things thoroughly though. But considering the cost/benefit, this is definitely not worth it.

Your tactic with creating a new 4.7 and then upgrading from there should work just fine. I would, however, suggest that you first copy your database, into a copy of your current distro and then deactivate and uninstall all the modules you wont be using. This will erase all the data that's associated to these modules, and should also help clean up your tables in case that have altered those you are using. This might not do anything, but it could help clean up the data you are using.

When Drupal create new nodes, it simply does a SQL INSERT, so it's the database that handles the ids. So you don't need to wory about that. The migrate module Jeremy suggests, should be able to help you out transferring data into your new Drupal install. Bit if you would rather write a script, what you propose seems just fine.

Note
As many upgrade guides and upgrade handbook say. When doing a major upgrade, you should always upgrade through all of the major Drupal releases (e.g. 4 to 5, 5 to 6...), until you reach your goal. The reason is that the internal structure and thus the database schema changes in every major release. So you need the changes done in the data-structure in version 5, to successfully upgrade to version 6. I didn't mention this in my original post, as took this for common knowledge, but it might be a good idea to mention none the less.

Another thing worth noticing, is that as Henrik O correctly points out, you can change the AUTO_INCREMENT value in your database. I can't remember 4.x upgrades, but I believe that Drupal will take care of this, in part of the 4-5 upgrade, as it implements serial tables in that release instead of managing this itself. Also whether or not to run a query to alter the AUTO_INCREMENT depends on the database backend, as not all backends use this property. PostgreSQL don't manage serials in that way, so it should automatically, start creating nodes with the correct nids, if you use it for Drupal.

googletorp
I would agree normally that copying just some tables can cause problems, but from the sounds of the question there are already problems with the install which were partially rectified by this solution.
Jeremy French
Yes, Jeremy's right that that's part of the reason for wanting as clean an install as possible - other reasons I didn't mention are that upgrading from the existing database cause a couple of problems. First, it seems to take an inordinately long time otherwise, as if it's choking on upgrading the system module in particular (it does finish, so maybe this is no problem, but it *feels* worrying.) Second, it throws up a few error reports: Views was installed permissions for non-existent groups are updated, etc.
od56
I was referring to the alteration of the block id's, not copying the tables, as being something you should be very careful doing. Altering directly in the db can cause some sneaky bugs - sometimes. Only copying tables is usually not that big a problem, as you usually quick can find out if something is wrong. It should be possible, however, to let Drupal trim the database for you, letting the expertise of the module creates help you out. As they write functions to clean up the database for unwanted things - at least many do.
googletorp
Ah, that's reassuring (that copying tables is usually not that big a problem). I would simply keep the fresh install's blocks table and lose all existing blocks, so this should be fine.I take your original point about being careful not to delete any tables referenced by tables I'm saving - I'll check this in the invaluable http://webdevgeeks.com/schemaspy/index.html
od56
BE CAREFUL with the schemaspy output - with Drupals lack of explicit foreign keys, it infers relations based on column names and types, which yields some wrong results. E.g. the 'vid' of the node table is incorrectly stated as referencing the 'vid' of the vocabulary table, where in fact it is the 'vid' of the revisins table ('version id' vs. 'vocabulary id'). Same goes for most 'name' columns, which are falsely listed as referencing the 'name' column of the sequences table.
Henrik Opel
Also, googletorps statement on ids being handled by the database is only correct for Drupal >= version 5, but NOT for Drupal 4.7 - in 4.7, ids where maintained by Drupal itself, storing the last used values in the sequences table!
Henrik Opel
+3  A: 

I'd say your basic approach is fine, but you are right to be concerned about things silently breaking. Your main concern should indeed be referential integrity, as the id handling changed since version 4.7. Originally, Drupal did not use the auto increment/serial features of the Database engines, but did its own id generation by means of the sequences table. They switched to serials in Drupal 5, but for some reason that I do not remember they still kept the sequences table until dropping it in version 6.

So if I where in your position, I'd add an intermediate step and upgrade/migrate to a Drupal 5 install first, then do another upgrade to Drupal 6. The reasoning is that the upgrading process is a more or less hand tuned collection of operations that got refined using the input of users having troubles doing it. Since the most error reports came in from users doing '1 version' upgrades only, going the same route should minimize the probability of encountering unexpected errors/conditions.

Also, your post on drupal.org reveals that the forum instance you want to 'merge in' is a Drupal 5 install, so you could do the merge while you are in the intermediate Drupal 5 'phase' of your migration. (Alternatively, I'd upgrade that instance to Drupal 6 separately before merging it into the main instance.)

As for the id adjustment in Drupal >= 5, you'd need to adjust the AUTO_INCREMENT start value for each affected table explicitly. For example in MySQL, issuing:

ALTER TABLE node AUTO_INCREMENT = 5432;

would tell the node table to start setting new serials (ids) starting at 5432 from now on

(NOTE: AUTO_INCREMENT is the MySQL way of handling this, if using PostgreSQL, take a look at the documentation for the serial 'pseudo' type and the accompanying sequence generation mechanism)

Obviously, you'd need to test your new instance thoroughly. Put a heavy focus on testing the insertion of new data as well as updating existing data (nodes, terms, anything you migrated), as this would reveal errors with the referential integrity.

Be thorough and you should be fine - Good Luck :)


Edit: You should also check the 'variables' table entries carefully, as some settings there might contain a reference to an id of a 'standard' table entry (e.g. a vocabulary vid, a term tid or something similar - in your case especially forum_nav_vocabulary and forum_containers)

Henrik Opel
You might want to add, that only MYSQL use auto_increment
googletorp
Good point - did so.
Henrik Opel