ansaurus

Question

Will inserting half a million entries with the same date value be slowed by a non-unique index on that date column?

Answer 1

+2 A:

Indexes slow down inserts but speed up queries. This is normal.

If it is a problem you can remove the index, insert the rows, then add the index again. This can be faster if you are doing many inserts at once.

The way you are copying the data using cursors seems to be inefficient. You could try a set-based approach instead:

INSERT INTO table1 (x, y, z)
SELECT x, y, z FROM table2 WHERE ...

Mark Byers 2010-09-08 20:35:07

can't remove index. Production table, in heavy use, 100's of millions of rows.

Kelley 2010-09-08 20:49:54

Better to mark the index as unusable than to drop and recreate it, as that would avoid unintended consequences from recreating the index with different options.

Adam Musch 2010-09-08 20:51:04

It's the half-million rows with same index value that I'm curious about. I am told a) this makes no difference or b) this is terrible, never do an index on a date column in Oracle. I'm leaning towards the latter being true based on the observed performance...

Kelley 2010-09-08 20:51:37

@Kelley: Have you considered that the slowness of the inserts could be due to the "heavy use" of the server?

Mark Byers 2010-09-08 20:57:23

If it was running that slow, every phone in my department would be ringing off the hook. HAd the dba review the system, all other processes are happy. Not locks, no bottlenecks, no ugly plan...

Kelley 2010-09-08 21:01:13

(b) is untrue - there are lots of good reasons to have an index on a date column. What is bad is having a b-tree index on something where you have 500,000 entries with the same value (relatively low cardinality). That sounds more like a binary index, or part of a compound index with better cardinality. That said, on a table of hundreds of millions of rows, 500000 could be considered low cardinality.

JulesLt 2010-09-09 11:54:36

To add to JulesLt comment, if OP changes the column to a bitmap index then OP should definitely move to set based versus row by row processing, since bitmap indexes don't handle small changes well.

Shannon Severance 2010-09-15 17:11:06

Answer 2

A:

In general, its often a good idea to delete the indexes before doing a massive insert and then add them back afterwards, so that the db doesnt have to try to update the indexes with each insert. Its been a long while since I've used oracle, but had you tried putting more than one insert statement in a transaction? That should also speed it up.

GrandmasterB 2010-09-08 20:35:24

Answer 3

+3 A:

Why don't you just do:

insert into target (columns....) 
select columns and computed values 
from source

commit

?

This slow by slow is doing far more damage to performance than an index that may not make any sense.

Shannon Severance 2010-09-08 22:44:41

Answer 4

+1 A:

Committing after every inserted row doesn't make much sense. If you're worried about exceeding undo capacity, for example, you can keep a count of the inserts and issue a commit after every thousand rows.

Updating the indexes will have some impact but that's unavoidable if you can't drop (or disable) while the inserts are performed, but that's just how it goes. I'd expect the commits to have a bigger impact, though I suspect that's a topic with varied opinions.

This assumes you have a good reason for inserting from a cursor rather than as a direct insert into ... select from model.

Alex Poole 2010-09-08 22:58:05

This job was stopped, then restarted with 125006 rows to process.I rewrote the process to change three things.1) COMMIT every 100 INSERTs2) Cursor close / open every 10,000 rows3) Change the time on the transaction_dt column by 1 second for each 10,000-row cursor cycle. Process started at 14:49:46First 10000 went fast, but then noticed slowdown, so started monitoring. Every 10K rows ran slower than the prior 10K. From 20K to 30K took 2.39, 60to70 took 7.54, and 110to120 took 12.52. Almost done. Opening a ticket with Oracle.

Kelley 2010-09-08 23:15:13

If there's an impact on the index it may come from the skewing of having so many values close together, unbalancing the tree - not particularly because they're dates, though it'd be interesting if that's worse than other datatypes for some reason. Does the cursor refer to the table you're inserting into - I assume you've checked the cursor performance on its own, without the inserts?

Alex Poole 2010-09-08 23:36:59

Cursor orgininally made no reference to the table for INSERT, but now makes a select that looks there avoid already-processed rows.

Kelley 2010-09-09 00:36:39

SELECT * FROM ec_acq WHERE SUBSCRIBER_ID NOT IN ( SELECT SUBSCRIBER_ID FROM CONTACT_HISTORY WHERE TRANSACTION_DT BETWEEN TO_DATE('09/07/2010 14:17:06','MM/DD/YYYY HH24:MI:SS') AND SYSDATE AND CONTACT_CD = 'EN' AND DESCRIPTION = 'blah blah blah');

Kelley 2010-09-09 00:37:41

The number of already inserted rows will be increasing as you go along.

JulesLt 2010-09-09 12:15:33

Answer 5

A:

Can you try Bulk Binding ?? I think it might help..

Pravin Satav 2010-09-09 09:11:59

Answer 6

A:

For operations like this you should look at oracle bulk operations, using FORALL and BULK COLLECT. It will reduce the number of DDL operations on the underlying tables considerably

create or replace procedure fast_proc is
    type MyTable is table of source_table%ROWTYPE;
    MyTable table;
    begin
        select * BULK COLLECT INTO table from source_table;

         forall x in table.First..table.Last
             insert into dest_table values table(x) ;
    end;

Visage 2010-09-09 09:19:48

I want to thank everyone here for their answers. Especially since I have another effort, exactly the same, which has resulted in 2.7 Million rows to INSERT.

Kelley 2010-09-09 17:25:24

It's amazing what will make you add to your knowledge base, or remind you of it. I will experiment with the comparison of INSERT from a SELECT as opposed to using a CURSOR. I'll see if I can get around the indexing overhead of everything having the same timestamp (thanks for reminding me of exactly why I thought that was a problem...rebalancing index trees.) Changing from 1 to 100 rows per COMMIT didn't much improve things. And I will read up on the 'BULK' facilities, on DBMS_PARALLEL_EXECUTE (cool!) and on the /*+APPEND*/ hint. I know about hints, I just forget they're there. Again Thanks

Kelley 2010-09-09 17:33:19

Answer 7

A:

Agreed on comment that what is killing your time is the 'slow by slow' processing. Copying 500,000 rows should be a matter of minutes.

The single INSERT ... SELECT FROM .... approach would be the best one, provided you have big enough Rollback segments. The database may even automatically apply parallel techniques to a plain SQL statement that it will not do with PL/SQL.

In addition you could look at using the /*+ APPEND */ hint - read up on it and see if it may apply to the situation with your target table.

o use all 10 cores you will need to either use plain parallel SQL, or run 10 copies of your pl/sql block, splitting the source table across the 10 copies.

In Oracle 10 this is a manual task (roll your own parallelism) but Oracle 11.2 introduces DBMS_PARALLEL_EXECUTE.

Failing that, bulking up your fetch / insert using the BULK COLLECT & bulk insert would be the next best option - process in chunks of 1000 or so rows (or larger). Again take a look as to whether DBMS_PARALLEL_EXECUTE may help you, or if you could submit the job in chunks via DBMS_JOB.

(Caveat : I don't have access to anything later than Oracle 10)

JulesLt 2010-09-09 12:10:03

ansaurus

tags:

views:

answers:

Will inserting half a million entries with the same date value be slowed by a non-unique index on that date column?

related questions