views:

59

answers:

3

Hi, guys,

I'm trying to integrate Hibernate Search into one of the projects I'm currently working on. The first step in such an endeavour is fairly simply - index all the existing entities with Hibernate Search(which uses Lucene under the hood). Many of the tables mapped to entities in the domain model contain a lot of records(> 1 million) and I'm using simple pagination technique to split them into smaller units. However I'm experiencing some memory leak while indexing the entities. Here's my code:

@Service(objectName = "LISA-Admin:service=HibernateSearch")
@Depends({"LISA-automaticStarters:service=CronJobs", "LISA-automaticStarters:service=InstallEntityManagerToPersistenceMBean"})
public class HibernateSearchMBeanImpl implements HibernateSearchMBean {
    private static final int PAGE_SIZE = 1000;

    private static final Logger LOGGER = LoggerFactory.getLogger(HibernateSearchMBeanImpl.class);

    @PersistenceContext(unitName = "Core")
    private EntityManager em;

    @Override
    @TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
    public void init() {
        FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);

        Session s = (Session) em.getDelegate();
        SessionFactory sf = s.getSessionFactory();
        Map<String, EntityPersister> classMetadata = sf.getAllClassMetadata();

        for (String key : classMetadata.keySet()) {
            LOGGER.info("Class: " + key + "\nEntity name: " + classMetadata.get(key).getEntityName());

            Class entityClass = classMetadata.get(key).getMappedClass(EntityMode.POJO);
            LOGGER.info("Class: " + entityClass.getCanonicalName());

            if (entityClass != null && entityClass.getAnnotation(Indexed.class) != null) {
                index(fullTextEntityManager, entityClass, classMetadata.get(key).getEntityName());
            }
        }
    }

    @TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
    public void index(FullTextEntityManager pFullTextEntityManager, Class entityClass, String entityName) {
        LOGGER.info("Class " + entityClass.getCanonicalName() + " is indexed by hibernate search");

        int currentResult = 0;

        Query tQuery = em.createQuery("select c from " + entityName + " as c order by oid asc");
        tQuery.setFirstResult(currentResult);
        tQuery.setMaxResults(PAGE_SIZE);

        List entities;

        do {
            entities = tQuery.getResultList();
            indexUnit(pFullTextEntityManager, entities);

            currentResult += PAGE_SIZE;
            tQuery.setFirstResult(currentResult);
        } while (entities.size() == PAGE_SIZE);

        LOGGER.info("Finished indexing for " + entityClass.getCanonicalName() + ", current result is " + currentResult);
    }

    @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
    public void indexUnit(FullTextEntityManager pFullTextEntityManager, List entities) {
        for (Object object : entities) {
            pFullTextEntityManager.index(object);
            LOGGER.info("Indexed object with id " + ((BusinessObject)object).getOid());
        }
    }
}

It's just a simple MBean, whose init method I execute manually via JBoss's JMX console. When I monitor the execution of the method in the JVisualVM I see that the memory usage constantly grows until all the heap is consumed and although a lot of garbage collections happen no memory get freed that leads me to believe I have introduced a memory leak in my code. I however cannot spot the offending code, so I'm hoping for your assistance in locating it.

The problem is certainly not in the indexing itself, because I get the leak even without it, so I think I'm not doing the pagination right. The only reference to the entities that I have, however, is the list entities, that should be easily garbage collected after each iteration of the loop calling indexUnit.

Thanks in advance for your help.

EDIT

Changing the code to

    List entities;

    do {
        Query tQuery = em.createQuery("select c from " + entityName + " as c order by oid asc");
        tQuery.setFirstResult(currentResult);
        tQuery.setMaxResults(PAGE_SIZE);

        entities = tQuery.getResultList();
        indexUnit(pFullTextEntityManager, entities);

        currentResult += PAGE_SIZE;
        tQuery.setFirstResult(currentResult);
    } while (entities.size() == PAGE_SIZE);

alleviated the problem. The leak is still there, but not as bad as it was. I guess there is something fault with the JPA query itself, keeping references it shouldn't, but who knows.

A: 

It looks like the injected EntityManager is holding on to a reference to all the entities returned from your query. It's a container managed EM so it should be closed or cleared automatically at the end of a transaction - but you're doing a bunch of non-transactional queries.

If you are just going to index the entities, you might want to call em.clear() at the end of the loop in init(). The entities will be detached (the EntityManager track changes made to them) but if they're just going to be GC'ed that shouldn't be a problem.

Mike
Nope, that's not the problem - the leak is still there. I think that the entity manager has no reason to track entities that aren't referred anywhere in the code either, so I assume the problem lies somewhere else.
Bozhidar Batsov
A: 

Seems like this question won't be finding a real solution. In the end I've just moved out the indexing code into a separate app - the leak is still there, but it doesn't matter that much, since the app is running to completion(with a huge heap) outside of the critical container.

Bozhidar Batsov
A: 

I don't think there is a "leak"; however, I do think that you're accumulating a high number of entities into the persistence context (yes, you are, since you're loading them) and, ultimately, eating all the memory. You need to clear the EM after each loop (without clear, paging doesn't help). Something like this:

    do {
        entities = tQuery.getResultList();
        indexUnit(pFullTextEntityManager, entities);

        pFullTextEntityManager.clear(); 

        currentResult += PAGE_SIZE;
        tQuery.setFirstResult(currentResult);
    } while (entities.size() == PAGE_SIZE);
Pascal Thivent
Well, I've added clear() method calls to both entity managers, but absolutely nothing has changed.
Bozhidar Batsov
@Bozhidar I somehow missed the first EM. Anyway, this doesn't change my reasoning. I still believe that calling `clear` is a must. Now, the question is: who is the greedy memory pig in your app now?
Pascal Thivent
@Pascal, I also believe that the leak is somehow connected to the persistence context, but I cannot understand what exactly is going wrong. When I monitor the memory usage from the JVisualVM I can see the garbage collections reclaiming most of the memory, but never all of it and gradually all the heap gets consumed. I was wondering if moving the query out of the loop and into the transactional method might improve something, since at least this may the memory would be reclaimed on the transaction commit, or so I hope...
Bozhidar Batsov
@Bozhidar Hmm... weird. You can indeed try to move things outside the loop but I would probably generate some heap dumps and analyze them with Eclipse MAT to find out the culprit. There must be something wrong somewhere (and I'm not saying it's in your code) but I just fail at spotting it (beyond the `clear`).
Pascal Thivent