views:

53

answers:

3

I have a program which reads a text file line by line, and creates a Hibernate entity object from each line, and saves them. I have several such text files to process, each of which has about 300,000 lines. I'm finding that my current implementation is excruciatingly slow, and I'm wondering if there's anything I can do to improve things.

My main method processes the text file line by line like so:

// read the file line by line
FileInputStream fileInputStream = new FileInputStream(new File(fileName));
InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
int lineCount = 0;
String line = bufferedReader.readLine();
while (line != null)
{
    // convert the line into an Observations object and persist it
    convertAndPersistObservationsLine(line);

    // if the number of lines we've processed has built up to the JDBC batch size then flush
    // and clear the session in order to control the size of Hibernate's first level cache
    lineCount++;
    if (lineCount % JDBC_CACHE_SIZE == 0)
    {
        observationsDao.flush();
        observationsDao.clear();
    }

    line = bufferedReader.readLine();
}

The convertAndPersistObservationsLine() method just splits the text line into tokens, creates a new entity object, populates the entity's fields with data from the tokens, and then saves the object via a DAO that calls Hibernate's Session.saveOrUpdate() method. The DAO methods flush() and clear() are direct calls to the corresponding Hibernate Session methods.

The Hibernate property 'hibernate.use_second_level_cache' is set to false, and the Hibernate property 'hibernate.jdbc.batch_size' is set to 50, as is the Java constant JDBC_CACHE_SIZE.

Can someone suggest a better way of going about this, or any tweaks to the above which may improve the performance of this batch loading program?

Thanks in advance for your help.

--James

+1  A: 

A few things:

  1. Can you quantify "excruciatingly slow"? How many inserts per second are you achieving? What rate do you think you should have instead? What type of load is the database itself under? Are others reading from the table at the same time?

  2. How are you connecting to the database? Is all of this occurring in a single transaction re-using the same connection?

  3. Are you by any chance using an identity identifier? The documentations states that JDBC batching is disabled silently if you are:

Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.

matt b
Thanks for the response, Matt.1. It looks like I am getting roughly 4 inserts per second. Can I expect to dramatically improve this?2. My DAO objects contain a SessionFactory object which is wired in via Spring dependency injection. The SessionFactory class I'm using is org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean. The transaction manager class I'm using is org.springframework.orm.hibernate3.HibernateTransactionManager and I have enabled transactional behavior based on annotations in my Spring application context config file.
James Adams
The DAO class has this as the transactional setting:@Transactional(propagation = Propagation.REQUIRED, readOnly = false, isolation = Isolation.DEFAULT)The SessionFactory has a DataSource property, and this is configured in Spring application context as well with normal settings for URL, user/password, driver class, etc.3. I don't think I'm using an identity identifier, the ID field for my entity class looks like this: @Id @Column(name = "ENTITY_ID") @GeneratedValue(strategy = GenerationType.AUTO) public T getId() { return id; }
James Adams
I apologize for not formatting the above comments -- I tried using basic HTML and Markdown but was never able to get anything to take and it always ended up as one run-on string.
James Adams
+2  A: 

The code itself and the Hibernate configuration look correct (by correct I mean that they follow the batch insert idiom from the documentation). But here are some additional suggestions:

As already mentioned, make absolutely sure that you aren't using an ID generator that defeats batching like IDENTITY. When using GenerationType.AUTO, the persistence provider will pick an appropriate strategy depending on the database so, depending on your database, you might have to change that for a TABLE or SEQUENCE strategy (because Hibernate can cache the IDs using an hi-lo algorithm).

Also make sure that Hibernate is batching as expected. To do so, activate logging and monitor the BatchingBatcher to track the size of the batch it's executing (will be logged).

In your particular case, you might actually consider using the StatelessSession interface (once the problem will be solved of course).

Pascal Thivent
A: 

If you are using MySQL, you might want to turn on rewriteBatchedStatements since MySQL does not support batch form of prepare statement parameter bindings. It will rewrite your insert statements to the form as "INSERT INTO YourEntity VALUES (...), (...), (...)".

Please refer to: http://shengchien.blogspot.com/2010/10/hibernate-batch-processing-with-mysql.html

Sheng Chien