views:

32

answers:

3

We have a simple interface to tag a particular question

(e.g. entry has 1..many tags and each tag entry has a foriegn key pointer back to the entry table)

1.    What is the current production version of the jdk? (Tags: jdk6 jdk-6 jdk java)
2.    In what version was java.util.spi package introduced? (Tags: jdk-6, jdk7, jdk5)
3.    Which version of java is going to be released soon? (Tags: jdk-6, jdk7, jdk8)

We would like to merge all tags named as "jdk-6" to jdk6. How do we achieve this in a system which is nearing production but contains useful data.

In [1] jdk-6 needs to be removed, since jdk6 is already present. In [2,3] jdk-6 needs to be renamed as "jdk6".

What kind of scripts do I need to migrate this data in a effective fashion.

EDIT

create table entry (id, question, ...)
create table entry_tag (id, entry_id, tag)
A: 

I would first create a new table with a list of entry IDs that contain either of the tags 'jdk-6' or 'jdk6'.

Then I would remove all tag records for the tags 'jdk6' and 'jdk-6'.

And then I would add them back in using the table created at the start.

ar
This will really screw with the clustering factor of the index on entry_tag.entry_id
Stephanie Page
+2  A: 

I would do the following:

  1. Update the "bad" tags with the good one (UPDATE TagTable SET Tag = 'jdk6' WHERE tag = 'jdk-6')

  2. Remove the duplicate tags (where entry_id and Tag are the same) . Exactly how you do this will depend on whether you have a separate unique key on the table or not, but a quick google will provide you with a variety of methods that work under different circumstances.

  3. Assuming you have a TagsList table with the list of all available tags, remove jdk-6 from it (DELETE FROM TagsList WHERE Tag = 'jdk-6').

Larry Lustig
+1 but the whole point of tagging is usually to not specify the entire domain in advance. Which leads to: why fix it if it can only get out of 'sorts' again?
Stephanie Page
A: 
/* Step 1 - Delete where both tags exist */
delete from et1
    from entry_tag et1
        inner join entry_tag et2
            on et1.entry_id = et2.entry_id
                and et2.tag = 'jdk6'
    where et1.tag = 'jdk-6'

/* Step 2 - Update remaining tags */
update entry_tag
    set tag = 'jdk6'
    where tag = 'jdk-6'
Joe Stefanelli