views:

642

answers:

4

Right - I want to delete (e.g.) 1,000,000 records from a database. This takes a long time -> the transaction times out and fails. So - I delete them in batches say 25000 records per transaction. Using the limit clause on MySQL or ROWNUM on Oracle. Great this works.

I want to do this in a database indepedent way. And from an existing Java code base that uses JPA/Hibernate.

Out of luck. JPA Query.setMaxResults and setFirstResult have no effect for write 'queries' (e.g. delete). Selecting many entities into memory to delete them individually is very slow and dumb I'd say.

So I use a native query and manage the 'limit' clause in application code. It'd be nice to encapsulate this clause in orm.xml but ... "Hibernate Annotations 3.2 does not support bulk update/deletes using native queries." - http://opensource.atlassian.com/projects/hibernate/browse/ANN-469.

I'd imagine this is a common problem. Anybody got a better database independent solution?

A: 

Limits on queries is a database specific feature and there is no SQL standard (I agree there should be).

A solution which works with most databases is using a view to group several tables into one. Each table contains a subset of the data (say one day). This allows you to drop a whole subset at once. That said, many databases have issues with running UPDATE and INSERT on such a view.

You can usually work around this by creating a view or alias for INSERT/UPDATE (which points to a single table; the "current" one) and a grouping view for searching.

Some databases also offer partitions which is basically the same thing except that you can define a column which specifies in which underlying table a row should go (on INSERT). When you need to delete a subset, you can drop/truncate one of the underlying tables.

Aaron Digulla
+2  A: 

I hate to give a non constructive answer but an ORM isn’t really meant for doing bulk operations on the database. So it looks like you native query is probably the best bet for these operations.

You should also make sure that your ORM is updated to reflect the new state of the database otherwise you may get some weirdness happening.

ORMs are great tools for mapping objects to databases, but they are not generally generic database interfaces.

Jeremy French
A: 

I believe you can use HQL (JPA QL) direct DML operations which will bypass the persistence context and cache, and execute the (resulting SQL) statements directly:

Query q = session.createQuery("delete YourEntity ye where ye.something like :param");
q.setParameter("param", "anything");
int deletedEntities = q.executeUpdate();
stian
A: 

Thanks for the answers guys. stian - I've tried this but there does not seem to be a way to limit the result set via JPA or Hibernate. Hence the transaction times out when there are lots of records to delete. Read the detail on my original question.