ansaurus

Question

Efficient way of phrasing multiple tuple pair WHERE conditions in SQL statement

Answer 1

+1 A:

I will do 3. (with JOIN rather than subquery) and measure time of DELETE query (without creating table and inserting). This is good starting point, because JOINing is very common and optimized procedure, so It will be hard to beat that time. Then you can compare that time to your current approach.

Also you can try following approach:

Sort pairs in same way as in index.
Delete using method 2. from your description (probably in single transaction).

Sorting before delete will give better index reading performance, because there's greater chance for hard-drive cache to work.

Tomasz Wysocki 2010-08-17 05:59:00

DELETE works against JOINed tables?

Thilo 2010-08-17 06:03:21

Yes, you have example in Frank Heikens answer.

Tomasz Wysocki 2010-08-17 06:12:46

That USING clause is neat. But he still needs to send the pairs into the database (unless they are already there somewhere).

Thilo 2010-08-17 06:44:29

I'm not suggesting that this is final solution. Temporary table deletion is great point of reference, because it will be very hard to delete records faster. So If one of other propositions will have similar speed, it will be good choice.

Tomasz Wysocki 2010-08-17 07:33:57

Answer 2

+1 A:

For a large number of pond1-pond2 pairs to be deleted in a single DELETE, I would create temporary table and join on this table.

-- Create the temp table:
CREATE TEMP TABLE foo AS SELECT * FROM (VALUES(1,2), (1,3)) AS sub (pond1, pond2);

-- Delete
DELETE FROM bar 
USING  
  foo -- the joined table
WHERE 
  bar.pond1= foo.pond1 
AND 
  bar.pond2 = foo.pond2;

Frank Heikens 2010-08-17 06:04:11

Filling the TEMP TABLE with the pairs is an equivalent problem to the original DELETE question, though (unless the pairs are already in the database somewhere).

Thilo 2010-08-17 06:10:19

No it's not, you can use COPY to fill the temp table. This a MUCH faster than any other option to get the data into your temp table. I just gave a very simple example, but the idea is the same.

Frank Heikens 2010-08-17 06:15:48

Can you show how to use COPY to fill the temp table?

Thilo 2010-08-17 06:37:45

@Thilo: Just check http://python.projects.postgresql.org/docs/1.0/copyman.html

Frank Heikens 2010-08-17 07:07:01

I see. 'receive_stmt = destination.prepare("COPY loading_table FROM STDIN")' would be a good way to put these numbers into the table.

Thilo 2010-08-17 07:12:24

Answer 3

A:

With hundred of thousands of pairs, you cannot do 1 (run the query as is), because the SQL statement would be too long.

3 is good if you have the pairs already in a table. If not, you would need to insert them first. If you do not need them later, you might just as well run the same amount of DELETE statements instead of INSERT statements.

How about a prepared statement in a loop, maybe batched (if Python supports that)

begin transaction
prepare statement "DELETE FROM pond_pairs WHERE ((pond1 = ?) AND (pond2 = ?))"
loop over your data (in Python), and run the statement with one pair (or add to batch)
commit

Where are the pairs coming from? If you can write a SELECT statements to identify them, you can just move this condition into the WHERE clause of your delete.

DELETE FROM pond_pairs WHERE (pond1, ponds) in (SELECT pond1, pond2 FROM ......  )

Thilo 2010-08-17 06:08:41

ansaurus

tags:

views:

answers:

Efficient way of phrasing multiple tuple pair WHERE conditions in SQL statement

related questions