tags:

views:

224

answers:

4

I need to delete millions of rows from a table from within an EJB Timer. The problem is that the timer has a transaction timeout of 90 seconds, so I should divide the work into bite-size chunks.

Since I don't know how many rows can be deleted in 90 seconds the algorithm should loop and delete a few at a time until the time is almost up.

The problem is: How can the number of rows to delete be limited elegantly in JPA? The delete is made on all rows having a timestamp earlier than a certain date.

I guess it is possible to find the 1000th oldest row and DELETE WHERE timestamp <= {1000th-oldest-row.timestamp} This, however, is not very elegant and I would have to get to the last row in a 1000 to get the timestamp.

Secondly, the timer should trigger immediately again if the table is not clean after the 90 seconds. This can be easily solved but, again, is not very elegant.

+1  A: 

We had a similar requirement and here is how we solved it. I was using EJB 3.0.

  1. Timer is started when the app. server starts (or the module is deployed) in a ServletContextListener.
  2. When the timer fires, it process up to 100 rows that are pending. You need then to order the result of the query and limit the number of row.
  3. If there was 100 rows, the timer schedules the next timeout with 0ms. That it, the transaction is committed, and the timer fires again in a new transaction.
  4. If there was less than 100 rows, the timer schedule the next timeout in 90sec.

If there are, say, 250 rows, the timer fires three time in a sequence. There is only a minor problem if there is exactly 100 row to process, in which case the timer fires twice in a sequence, but the 2nd fire processes actually nothing. But all in all, it was working OK.

ewernli
Yes, that is just how I'd like to do it. But how can I limit the number of rows that are deleted elegantly?
Hugo
Something like `DELETE FROM MyTable ORDER BY timestamp DESC LIMIT 100`. The exact SQL syntax will depend on the database you use. With JDBC, you can then get the number of row affected by the `DELETE`, and proceed as I described.
ewernli
+1  A: 

One trick I've used within SQL is to DELETE TOP 1000 (or 100 or 10000, depending on the average number of rows in a page), like so:

DELETE top 1000 WHERE timestamp <= @ExpirationDate

Call this repeatedly until no rows are deleted (check with @@rowcount) or you run out of time. Can this technique be implemented in JPA?

Philip Kelley
A: 

Solved the problem by getting a sorted list of rows eligible to clean and using setFirstResult(int) to the same as setMaxResults(int). This way I get the ordering of an item approximately maxCount steps from the oldest.

Query expired = dm.createNamedQuery("getExpiredElements");
expired.setParameter("currentTime", getCurrentTime());
expired.setMaxResults(maxCount);
expired.setFirstResult(maxCount);
@SuppressWarnings("unchecked")
List<Item> expiredChunk = (List<Item>) expired.getResultList();
long lastChunkEndTime = expiredChunk.get(0).getEndTime();
Query query = em.createNamedQuery("deleteExpiredItems");
query.setParameter("currentTime", lastChunkEndTime);
int result = query.executeUpdate();
return result >= maxCount;

The function returns true (at least) if it should be executed again.

Hugo
+2  A: 

You will still face transaction expiration issues with the solution you have.

The trick is to execute each chunk in a separate transaction as shown below in pesudo code.

@Entity

@NamedQueries ( value = {
    @NamedQuery (
        name = pagedDeleteExpiredItems
        query=    DELETE FROM MyTable
            WHERE (<table key>) IN (
                SELECT <table key> FROM (
                SELECT ROWNUM AS row_num, <table key> FROM MyTable
                WHERE timestamp <= :currentTime
                )
                WHERE row_num < :pageSize
            )
    )
})

public class MyEntity {
    @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
    int doPagedDeleteExpiredItems(Date currentTime, int pageSize) {
        Query query = em.createNamedQuery("pagedDeleteExpiredItems");
        query.setParameter("currentTime", currentTime);
        query.setParameter("pageSize", pageSize);
        int deleteCount = query.executeUpdate();
        return deleteCount;
    }
}


@EJBTimer
public class DeleteExpiredItemsTimer {

    @EJB(beanName = "MyEntity")
    MyEntity myEntity;

    @TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
    void handleTimeout(Timer timer) {
        Date currentTime = getCurrentTime()
        int pageSize = 100
        int deleteCount;
        do {
            myEntity.doPagedDeleteExpiredItems(currentTime, pageSize);
        } while(deleteCount>0);
    }
}
Gladwin Burboz
Well, the transactions are solved for me in my constraint. But your solution is beautiful.
Hugo
Thanks Hugo for your comment.
Gladwin Burboz