views:

103

answers:

3

I am trying to update a record in the target table based on the record coming in from source. For instance, if the incoming record is present in the target table I would update them in the target else I would simply insert. I have over one million records in my source while my target has 46 million records. The target table is partitioned based on calendar key. I implement this whole logic using Informatica. I find that the Informatica code is perfectly fine looking at the Informatica session log but its in the update it takes long time (more than 5 days to update one million records).

Any suggestions as to what can be done on the scenario to improve the performance?

A: 

Sounds like there either are no applicable indexes on your table or they're out of date, so it has to examine each record one-by-one, for a total of (1 million * 46 million = 46 trillion) comparisons.

Try adding an index on whatever key you're searching on to do the update. Or if they exist already, try rebuilding it as it might be too out of sync to be useful.

lc
Hi, thanks for the reply. I find that the indexes are in enabled state. We have a composite primary key created on the following columns: CALENDAR_KEY, DAY_TIME_KEY, SITE_KEY, RESERVATION_AGENT_KEY, LOSS_CODE, PROP_ID, AGENT_ID, LOCATION. The WHERE clause in the update statement has all the above columns. For your information the table is partitioned based on calendar key column. Please do the needful. Thanks.
B Senthil Kumar
+6  A: 

You can try this

  1  MERGE
  2     INTO  target_table tgt
  3     USING source_table src
  4     ON  ( src.object_id = tgt.object_id )
  5  WHEN MATCHED
  6  THEN
  7     UPDATE
  8     SET   tgt.object_name = src.object_name
  9     ,     tgt.object_type = src.object_type
 10  WHEN NOT MATCHED
 11  THEN
 12     INSERT ( tgt.object_id
 13            , tgt.object_name
 14            , tgt.object_type )
 15     VALUES ( src.object_id
 16            , src.object_name
 17            , src.object_type );

The syntax at first looks a little daunting, but if we read through from top to bottom, it is quite intuitive. Note the following clauses:

•MERGE (line 1): as stated previously, this is now the 4th DML statement in Oracle. Any hints we might wish to add directly follow this keyword (i.e. MERGE /*+ HINT */);

•INTO (line 2): this is how we specify the target for the MERGE. The target must be either a table or an updateable view (an in-line view cannot be used here);

•USING (line 3): the USING clause represents the source dataset for the MERGE. This can be a single table (as in our example) or an in-line view;

•ON () (line 4): the ON clause is where we supply the join between the source dataset and target table. Note that the join conditions must be in parentheses;

•WHEN MATCHED (line 5): this clause is where we instruct Oracle on what to do when we already have a matching record in the target table (i.e. there is a join between the source and target datasets). We obviously want an UPDATE in this case. One of the restrictions of this clause is that we cannot update any of the columns used in the ON clause (though of course we don't need to as they already match). Any attempt to include a join column will raise an unintuitive invalid identifier exception; and

•WHEN NOT MATCHED (line 10): this clause is where we INSERT records for which there is no current match.

Bharat
Voted up. You don't want to be doing this scale of updates on a row by row basis, or pulling millions of records out of the database and pushing them back (especially over a network). Push the logic into the DB.
Gary
A: 

I am not sure how this is applicable to your project since you may need to change a lot. Since you are dealing with millions on records, I would recommend a batch job. You can use SQL Loader utility. But it depends on the format of source. If it is a file(e.g. csv file), it is the right choice.

Sujee