views:

664

answers:

5

I need to export big amount of data from database. Here is classes that represents my data:

public class Product{
...

    @OneToMany
    @JoinColumn(name = "product_id")
    @Cascade({SAVE_UPDATE, DELETE_ORPHAN})
    List<ProductHtmlSource> htmlSources = new ArrayList<ProductHtmlSource>();

... }

ProductHtmlSource - contains big string inside which I actually need to export.

Since size of exported data is bigger than JVM memory I'm reading my data by chunks. Like this:

final int batchSize = 1000;      
for (int i = 0; i < 50; i++) {
  ScrollableResults iterator = getProductIterator(batchSize * i, batchSize * (i + 1));
  while (iterator.getScrollableResults().next()) {
     Product product = (Product) iterator.getScrollableResults().get(0); 
     List<String> htmls = product.getHtmlSources();
     <some processing>
  }

}

Code of getProductIterator :

public ScrollableResults getProductIterator(int offset, int limit) {
        Session session = getSession(true);
        session.setCacheMode(CacheMode.IGNORE);
        ScrollableResults iterator = session
                .createCriteria(Product.class)
                .add(Restrictions.eq("status", Product.Status.DONE))
                .setFirstResult(offset)
                .setMaxResults(limit)
                .scroll(ScrollMode.FORWARD_ONLY);
        session.flush();
        session.clear();

        return iterator;
    }

The problem is that in spite of I clearing session after reading of each data chunk Product objects accumulates somewhere and I'm get OutOfMemory exception. The problem is not in processing block of code even without it I get memory error. The size of batch also is not a problem since 1000 objects easily sit into memory.

Profiler showed that objects accumulates in org.hibernate.engine.StatefulPersistenceContext class.

The stacktrace:

Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
    at java.lang.StringBuffer.append(StringBuffer.java:307)
    at org.hibernate.type.TextType.get(TextType.java:41)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:163)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:154)
    at org.hibernate.type.AbstractType.hydrate(AbstractType.java:81)
    at org.hibernate.persister.entity.AbstractEntityPersister.hydrate(AbstractEntityPersister.java:2101)
    at org.hibernate.loader.Loader.loadFromResultSet(Loader.java:1380)
    at org.hibernate.loader.Loader.instanceNotYetLoaded(Loader.java:1308)
    at org.hibernate.loader.Loader.getRow(Loader.java:1206)
    at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:580)
    at org.hibernate.loader.Loader.doQuery(Loader.java:701)
    at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
    at org.hibernate.loader.Loader.loadCollection(Loader.java:1994)
    at org.hibernate.loader.collection.CollectionLoader.initialize(CollectionLoader.java:36)
    at org.hibernate.persister.collection.AbstractCollectionPersister.initialize(AbstractCollectionPersister.java:565)
    at org.hibernate.event.def.DefaultInitializeCollectionEventListener.onInitializeCollection(DefaultInitializeCollectionEventListener.java:63)
    at org.hibernate.impl.SessionImpl.initializeCollection(SessionImpl.java:1716)
    at org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:344)
    at org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersistentCollection.java:86)
    at org.hibernate.collection.AbstractPersistentCollection.readSize(AbstractPersistentCollection.java:109)
    at org.hibernate.collection.PersistentBag.size(PersistentBag.java:225)
    **at com.rivalwatch.plum.model.Product.getHtmlSource(Product.java:76)
    at com.rivalwatch.plum.model.Product.getHtmlSourceText(Product.java:80)
    at com.rivalwatch.plum.readers.AbstractDataReader.getData(AbstractDataReader.java:64)**
A: 

Can you post the Exception stacktrace? It may be solved by passing suitable JVM options for GC.

I think this is related - http://stackoverflow.com/questions/1707350/java-stringbuilder-huge-overhead.

Looks from the StackTrace that a very large String is being created and causing the exception.

Padmarag
Have you tried storing in LOB and using Streams for output?
Padmarag
+1  A: 

At the risk of appearing stupid - have you considered doing this another way?

Personally I would avoid doing batch processing that "far away" from the database. I don't know what database you're using but there's usually a mechanism for efficiently pulling a dataset out of the database & into a file even if it involves moderately simple manipulation on the way out. Stored procedures, specific export utilities. Investigate what else is available from your database vendor.

mcottle
+1  A: 

It looks like you are calling getProductIterator() with the starting and ending row numbers, while getProductIterator() is expecting the starting row and a row count. As your "upper limit" gets higher you are reading data in bigger chunks. I think you mean to pass batchSize as the second argument to getProductIterator().

KeithL
Oh, yes! Stupid me. You're 100% right:)
Vladimir
A: 

Not a direct answer but for this kind of data manipulation, I would use the StatelessSession interface.

Pascal Thivent
+1  A: 

KeithL is right - you're passing an ever-increasing limit. But breaking it up that way doesn't make sense anyway. The whole point of a scroll cursor is that you process a row at a time so there's no need to break it up into chunks. The fetch size reduces the trips to the database at the cost of using up more memory. The general pattern should be:

Query q = session.createCriteria(... no offset or limit ...);
q.setCacheMode(CacheMode.IGNORE);
q.setFetchSize(1000);  // experiment with this to optimize performance vs. memory
ScrollableResults iterator = query.scroll(ScrollMode.FORWARD_ONLY);
while (iterator.next()) {
  Product p = (Product)iterator.get();
  ...
  // session.evict(p);  // an alternative to setting the cache mode above
}

That said, the error is getHtmlSources so the problem may be completely unrelated to the session/cursor/scroll issue. If those html strings are huge and they're being referenced the entire time, you may just be running out of contiguous memory.

Btw, I don't see a getScrollableResults method on ScrollableResults.

Brian Deterling