tags:

views:

133

answers:

7

I have a DB table in which each row has a randomly generated primary key, a message and a user. Each user has about 10-100 messages but there are 10k-50k users.

I write the messages daily for each user in one go. I want to throw away the old messages for each user before writing the new ones to keep the table as small as possible.

Right now I effectively do this:

delete from table where user='mk'

then write all the messages for that user.

I'm seeing a lot of contention because I have lots of threads doing this at the same time.

Can anyone offer any advice?

Would it be better to:

select key from table where user='mk'

then delete individual rows from there? I'm thinking that might lead to less brutal locking.

I'm using Oracle BTW.

Thanks!

EDIT

Thanks for all the answers!

I was unclear in my question. As APC says, I do have an additional requirement to retain the most recent set of messages for each user.

I don't have access to the DB directly. I'm trying to guess at the problem based on some second hand feedback. The reason I'm focusing on this scenario is that the delete query is showing a lot of wait time (again - to the best of my knowledge) plus it's a newly added bit of functionality.

Your answers have already been helpful - thanks!

+3  A: 

No, it is always better to perform a single SQL statement on a set of rows than a series of "row-by-row" (or what Tom Kyte calls "slow-by-slow") operations. When you say you are "seeing a lot of contention", what are you seeing exactly? An obvious question: is column USER indexed?

(Of course, the column name can't really be USER in an Oracle database, since it is a reserved word!)

EDIT: You have said that column USER is not indexed. This means that each delete will involve a full table scan of up to 50K*100 = 5 million rows (or at best 10K * 10 = 100,000 rows) to delete a mere 10-100 rows. Adding an index on USER may solve your problems.

Tony Andrews
-1 The series row-by-row might be slower when run on a stand-alone machine. But they'll cause less contention, and in a system that's under sustained use, usually speed things up overall (in my experience)
Andomar
Whether deleting in a single command causes more contention depends on the database - and the access pattern to these records. Somehow I doubt users are reading their messages so frequently they cause substantial conflicts with the deletes.
Nick Johnson
Also, if the problem is disk contention, using more transactions will cause _more_ disk contention.
Nick Johnson
@Andomar, are you seriously saying deleting "10-100" rows in a single statement is a bad idea, and looping over them and deleting one at a time is better? Seriously?
Tony Andrews
@Tony Andrews: Yes, in many situations, f.e. http://oracle-online-help.blogspot.com/2006/12/delete-in-batches.html. Of course if a single DELETE works, and doesn't break other stuff, that's infinitely preferable
Andomar
@Andomar: (1) that link is talking about the problems of deleting "huge [amounts of] data" and running out of rollback. I don't consider 10-100 rows to be anything like "huge". (2) I think the suggested code in that link is very poor indeed.
Tony Andrews
@Tony Andrews: You're probably right on the link. The question says "there are 10k-50k users" with "10-100" messages. That's definitely the range where splitting transactions makes sense in my experience.
Andomar
On a busy OLTP system it might be preferable to delete in smaller batches, but that doesn't have to mean row-by-row and it doesn't have to mean opening a cursor and then commiting every n rows. It's more likely to mean repeatedly running a delete statement with ROWNUM < 1000 for example, and taking a pause between executions using DBMS_LOCK.SLEEP. It's not as efficient as deleting with a single statement (or truncating) but it's a little more "friendly"
David Aldridge
@David Aldridge: Agreed, ROWNUM<1000 is how I'd do it as well
Andomar
USER is not indexed (nor is it the column name ;) ). I edited my question to say that I am working from limited feedback - "it was faster before" type of thing.
monorailkitty
If USER is not indexed, then there is your problem (I imagine). Each delete is performing a full table scan.
Tony Andrews
A: 

Are you sure you're seeing lock contention? It seems more likely that you're seeing disk contention due to too many concurrent (but unrelated updates). The solution to that is simply to reduce the number of threads you're using: Less disk contention will mean higher total throughput.

Nick Johnson
+4  A: 

If you do this everyday for every user, why not just delete every record from the table in a single statement? Or even

truncate table whatever reuse storage
/

edit

The reason why I suggest this approach is that the process looks like a daily batch upload of user messages preceded by a clearing out of the old messages. That is, the business rules seems to me to be "the table will hold only one day's worth of messages for any given user". If this process is done for every user then a single operation would be the most efficient.

However, if users do not get a fresh set of messages each day and there is a subsidiary rule which requires us to retain the most recent set of messages for each user then zapping the entire table would be wrong.

APC
Why does this get voted up? The question says "I want to throw away the old messages for each user", but this throws away all messages.
Andomar
@Andomar - makes sense to me if you take the question at face value - it seems like the requirement is to have a day's worth of messages for each user, so why not just clean the table and write all new messages for the day?
dpbradley
@dpbradley: OK... I see you can read the question that way. If that's what the OP meant, this is definitely the optimal answer. :)
Andomar
"delete from table where user='mk'" says that all messages are deleted, and presumably that's for every user.
David Aldridge
I edited my question off the back of your answer. As you guessed, I need to keep the most recent set of messages for each user. I only want to delete the old messages for users for which I have new messages. Thanks!
monorailkitty
A: 

Your own suggestion seems very sensible. Locking in small batches has two advantages:

  • the transactions will be smaller
  • locking will be limited to only a few rows at a time

Locking in batches should be a big improvement.

Andomar
-1 Yes, a batch of ALL 10-100 of them would be the best!
Tony Andrews
In Oracle having small transactions is not an advantage: the amount of work done will be smaller in one BIG delete than in MANY small delete statements. Furthermore, If there is only one modifying session (in this case a batch job), the locking is irrelevant: a massive update won't block readers anyway. All considered a massive update/delete is often more efficient in Oracle.
Vincent Malgrat
@Vincent: Agreed. I read the query to mean 500.000 updates in contention with other queries like setting "read" flags.
Andomar
A: 

I think you need to define your requirements a bit clearer...

For instance. If you know all of the users who you want to write messages for, insert the IDs into a temp table, index it on ID and batch delete. Then the threads you are firing off are doing two things. Write the ID of the user to a temp table, Write the message to another temp table. Then when the threads have finished executing, the main thread should

DELETE * FROM Messages INNER JOIN TEMP_MEMBERS ON ID = TEMP_ID

INSERT INTO MESSAGES SELECT * FROM TEMP_messges

im not familiar with Oracle syntax, but that is the way i would approach it IF the users messages are all done in rapid succession.

Hope this helps

TerrorAustralis
Note that the INNER JOIN syntax on the DELETE is non-standard and does not work with Oracle. You'll need to use a DELETE...WHERE statement instead.
DCookie
No, it works with Oracle as long as there is a unique or primary key on the TEMP_MEMBERS.TEMP_ID column
David Aldridge
A: 

TALK TO YOUR DBA

He is there to help you. When we DBAs take access away from the developers for something such as this, it is assumed we will provide the support for you for that task. If your code is taking too long to complete and that time appears to be tied up in the database, your DBA will be able to look at exactly what is going on and offer suggestions or possibly even solve the problem without you changing anything.

Just glancing over your problem statement, it doesn't appear you'd be looking at contention issues, but I don't know anything about your underlying structure.

Really, talk to your DBA. He will probably enjoy looking at something fun instead of planning the latest CPU deployment.

Rob
A: 

This might speed things up:

Create a lookup table:

create table rowid_table (row_id ROWID ,user VARCHAR2(100));
create index rowid_table_ix1 on rowid_table (user);

Run a nightly job:

truncate table rowid_table;
insert /*+ append */ into rowid_table
select ROWID row_id , user
from table;
dbms_stats.gather_table_stats('SCHEMAOWNER','ROWID_TABLE');

Then when deleting the records:

delete from table
where ROWID IN (select row_id
                from rowid_table
                where user = 'mk');
Robert Merkwürdigeliebe
Do you realize that lookup table is doing the work of an index?
Rob
The index on the lookup table is dooing the work of the index. If the poster really does have contention on the table and possibly indexes on that table (maybe with OLTP transactions) it diverts the io to a different object.
Robert Merkwürdigeliebe