views:

457

answers:

5

I have a webservice in java that receives a list of information to be inserted or updated in a database. I don't know which one is to insert or update.

Which one is the best approach to abtain better performance results:

  1. Iterate over the list(a object list, with the table pk on it), try to insert the entry on Database. If the insert failed, run a update

  2. Try to load the entry from database. if the results retrieved update, if not insert the entry.

  3. another option? tell me about it :)

In first calls, i believe that most of the entries will be new bd entries, but there will be a saturation point that most of the entries will be to update.

I'm talking about a DB table that could reach over 100 million entries in a mature form.

What will be your approach? Performance is my most important goal.

+1  A: 

If performance is your goal then first get rid of the word iterate from your vocabulary! learn to do things in sets.

If you need to update or insert, always do the update first. Otherwise it is easy to find yourself updating the record you just inserted by accident. If you are doing this it helps to have an identifier you can look at to see if the record exists. If the identifier exists, then do the update otherwise do the insert.

HLGEM
+1  A: 

The important thing is to understand the balance or ratio between the number of inserts versus the number of updates on the list you receive. IMHO you should implement an abstract strategy that says "persists this on database". Then create concrete strategies that (for example):

  1. checks for primary key, if zero records are found does the insert, else updates
  2. Does the update and, if fails, does the insert.
  3. others

And then pull the strategy to use (the class fully qualified name for example) from a configuration file. This way you can switch from one strategy to another easily. If it is feasible, could be depending on your domain, you can put an heuristic that selects the best strategy based on the input entities on the set.

smink
+4  A: 

If your database supports MERGE, I would have thought that was most efficient (and treats all the data as a single set).

See:

http://www.oracle.com/technology/products/oracle9i/daily/Aug24.html

http://blogs.techrepublic.com.com/datacenter/?p=194

DanSingerman
YES! It supports merge :) Thank god :)
sakana
+1  A: 

MySQL supports this:

INSERT INTO foo
SET bar='baz', howmanybars=1
ON DUPLICATE KEY UPDATE howmanybars=howmanybars+1
chaos
+1  A: 

Option 2 is not going to be the most efficient. The database will already be making this check for you when you do the actual insert or update in order to enforce the primary key. By making this check yourself you are incurring the overhead of a table lookup twice as well as an extra round trip from your Java code. Choose which case is the most likely and code optimistically.

Expanding on option 1, you can use a stored procedure to handle the insert/update. This example with PostgreSQL syntax assumes the insert is the normal case.

CREATE FUNCTION insert_or_update(_id INTEGER, _col1 INTEGER) RETURNS void
AS $$
    BEGIN
        INSERT INTO
            my_table (id, col1)
        SELECT
            _id, _col1;
    EXCEPTION WHEN unique_violation THEN
        UPDATE
            my_table
        SET
            col1 = _col1
        WHERE
            id = _id;
    END;
END;
$$
LANGUAGE plpgsql;

You could also make the update the normal case and then check the number of rows affected by the update statement to determine if the row is actually new and you need to do an insert.

As alluded to in some other answers, the most efficient way to handle this operation is in one batch:

  1. Take all of the rows passed to the web service and bulk insert them into a temporary table
  2. Update rows in the mater table from the temp table
  3. Insert new rows in the master table from the temp table
  4. Dispose of the temp table

The type of temporary table to use and most efficient way to manage it will depend on the database you are using.

cope360