views:

2655

answers:

5

Let's say I have a table tbl with columns id and title. I need to change all values of title column:

  1. from 'a-1' to 'a1',
  2. from 'a.1' to 'a1',
  3. from 'b-1' to 'b1',
  4. from 'b.1' to 'b1'.

Right now, I'm performing two UPDATE statements:

UPDATE tbl SET title='a1' WHERE title IN ('a-1', 'a.1')
UPDATE tbl SET title='b1' WHERE title IN ('b-1', 'b.1')

This isn't at all a problem, if the table is small, and the single statement completes in less than a second and you only need a few statements to execute.

You probably guested it - I have a huge table to deal with (one statement completes in about 90 seconds), and I have a huge number of updates to perform.

So, is it possible to merge the updates so it would only scan the table once? Or perhaps, there's a better way to deal with in a situation like this.

EDIT: Note, that the real data I'm working with and the changes to the data I have to perform are not really that simple - the strings are longer and they don't follow any pattern (it is user data, so no assumptions can be made - it can be anything).

+2  A: 

If the transformations are as simple as your examples, you could do the update with a little bit of string manipulation:

UPDATE tbl 
SET title = left(title, 1) + right(title, 1) 
WHERE title IN ('a-1', 'a.1', 'b-1', 'b.1')

Would something like that work for you?

Matt Hamilton
No, unfortunately, the real I data I deal with isn't as simple as in my example. This wouldn't work for me. Thanks anyway, though.
Paulius Maruška
Sounds like casperOne's use of the CASE WHEN expression is the way to go then.
Matt Hamilton
+5  A: 

You can use one statement and a number of case statements

update tbl
  set title = 
    case
      when title in ('a-1', 'a.1') then 'a1'
      when title in ('b-1', 'b.1') then 'b1'
      else title
    end

Of course, this will cause a write on every record, and with indexes, it can be an issue, so you can filter out only the rows you want to change:

update tbl
  set title = 
    case
      when title in ('a-1', 'a.1') then 'a1'
      when title in ('b-1', 'b.1') then 'b1'
      else title
    end
where
  title in ('a.1', 'b.1', 'a-1', 'b-1')

That will cut down the number of writes to the table.

casperOne
I will probably end up using something similar to your second example. Thanks.
Paulius Maruška
Bravo! This is helpful.
javamonkey79
Your answer is great, but I have to accept Jonathan's answer, because I think his method is slightly better. I wish I could accept two answers, though.
Paulius Maruška
A: 

Or

   Update Table set 
     title = Replace(Replace(title, '.', ''), '-', '')
   Where title Like '[ab][.-]1'
Charles Bretana
As I mentioned in the comments of Matt's answer - the data isn't as simple in the real database.
Paulius Maruška
At the risk of sounding obvious, well, then, the answer will also be less simple. What's the real problem?
Charles Bretana
Perhaps, I should have made the titles different in the example, so that they weren't as simple. The real problem is that the real titles are strings that do not follow any pattern - they are in fact user generated titles, so I can make no assumptions about them. I edited my question as well.
Paulius Maruška
+3  A: 

In a more general case, where there could be many hundreds of mappings to each of the new values, you would create a separate table of the old and new values, and then use that in the UPDATE statement. In one dialect of SQL:

CREATE TEMP TABLE mapper (old_val CHAR(5) NOT NULL, new_val CHAR(5) NOT NULL);
...multiple inserts into mapper...
INSERT INTO mapper(old_val, new_val) VALUES('a.1', 'a1');
INSERT INTO mapper(old_val, new_val) VALUES('a-1', 'a1');
INSERT INTO mapper(old_val, new_val) VALUES('b.1', 'b1');
INSERT INTO mapper(old_val, new_val) VALUES('b-1', 'b1');
...etcetera...

UPDATE tbl
   SET title = (SELECT new_val FROM mapper WHERE old_val = tbl.title)
   WHERE title IN (SELECT old_val FROM mapper);

Both select statements are crucial. The first is a correlated sub-query (not necessarily fast, but faster than most of the alternatives if the mapper table has thousands of rows) that pulls the new value out of the mapping table that corresponds to the old value. The second ensures that only those rows which have a value in the mapping table are modified; this is crucial as otherwise, the title will be set to null for those rows without a mapping entry (and those were the records that were OK before you started out).

For a few alternatives, the CASE operations are OK. But if you have hundreds or thousands or millions of mappings to perform, then you are likely to exceed the limits of the SQL statement length in your DBMS.

Jonathan Leffler
This is very VERY interesting. I never even thought of that. The inserts into a mapper would still be fast, and the update would only scan my table once and I don't need to construct huge queries.
Paulius Maruška
but better to use a join than a correlated subquery for performance reasons.
HLGEM
@HLGEM: yes, if your DBMS supports the notation. Would you care to proffer a working syntax for some DBMS you know about? If so, please edit my answer - I believe you have enough rep to do that. Or let me know by email - see my profile page.
Jonathan Leffler
+4  A: 

Working off of Jonathan's answer.

UPDATE tbl
   SET title = new_val
FROM mapper
WHERE title IN (SELECT old_val FROM mapper)
     AND mapper.old_val = tbl.title;

His initial version would require a large number of reads to the mapper table.

mrdenny
@MrDenny: may I copy your material up into my answer - with credit given, of course?
Jonathan Leffler
I used his query, and it worked like a charm! I was actually surprised - it was really faster than I thought it would be. Very nice.
Paulius Maruška