views:

173

answers:

2

I am sure I can improve the performance of the following findByName query of hibernate:

public List<User> findByName(String name) {
  session.createCriteria(User.class).add(Restrictions.eq("name", name)).list();
}

The bottleneck is that findByName method and I cannot use the id instead.

In my case I know that the name is unique but adding an Index annotation to the name attribute didn't improve the performance. I did the following:

class User {
  @Index(name = "nameIdx")
  private String name;      
}

In which way should I improve it or even more important: in which ways should I improve it first? I will need the full object with all the collections (layz or not) and deps of this class.

Or can I improve it, if I want several User objects (and know several names)?

Update1:

The @Index annotation didn't improve the performance, because the database already had an index, because of my unique constraint annotation:

@UniqueConstraint(columnNames = {"name"})

Update2:

  1. Read the answers carefully!

  2. With the help of the SQL logging I saw that the real issue was that a lot of update and insert statements were raised although I did not commit or flush the transaction. The reason behind that was that I did (in a loop):

    User u = findByName(name);
    if(u == null) 
       attach(u = new User(name));
    

    and so hibernate needs to flush the newly created users to the db before every findByName query. I solved this with my own cache workaround (LinkedHashMap).

  3. Another improvement I made through Jens Schauder's tip:

    public Collection<User> findByNames(Collection<String> names) {
       return session.createCriteria(User.class).
              add(Restrictions.in("name", names)).list();
    }
    
  4. A further improvement could be made when specifying some of the user collection as not lazy:

    @LazyCollection(LazyCollectionOption.FALSE)
    

    Read this answer to get an even better option.

  5. The last and most important one for me is: replacing my SortedSet items with a list and doing the following in the getItems method:

    Set set = new LinkedHashSet(items);
    items.clear();
    items.addAll(set);
    Collections.sort(items, itemComparator);
    return Collections.unmodifiableCollection(items);
    

    with that, hibernate can work on the items collection (i.e. adding) without loading the entire collection from database.

@Pascal Thivent and @Jens Schauder: bunch of thanks! Sorry, that I can only accept one answer :-/

Helpful Logging settings:

log4j.logger.org.hibernate.tool.hbm2ddl=INFO, StdoutApp
log4j.logger.org.hibernate.SQL=INFO, StdoutApp
# additionally provide the information which parameters will be bound:
log4j.logger.org.hibernate.type=TRACE

Another helpful link.

+3  A: 

You are not providing enough information for a complete answer but here are some ideas:

  • can you use the id instead? Hibernate will prepare queries for select by id, so these will be (a little) faster than other queries
  • is the name properly indexed? For the purpose of this query it should have a unique key (you are hinting, you are expecting a single result). Of course such an index does cost performance on insert, update and delete.
  • when we come to the references, it depends on what you mean by performance: The time until the statement returns? Then you should use lazy loading. It makes the first statement faster and therefore probably faster. Of course you'll have more statements afterwards, once you the references get dehydrated. Otherwise (some) eager loading is probably faster, although that depends heavily on the details.
  • employ caching, this might help especially for the references, if the can get retrieved from the cache.
  • tune your database. Give it enough memory to hold everything in memory all the time.
  • tune your network. With small queries like the one shown, latency might be an issue
  • remove the network by putting the db on the same machine as the code. Assuming it is big enough.

As you can see you have tons of options for tuning. The only one I'd expect a good effect for the effort with is considering an index. Of course this might change when we have more information about the problem (e.g. complete Table structure, indexes, hibernate mapping, size of tables ...)


UPDATE based on the comment:

When tuning, the first question is: What is it we need to tune? Is it the conversion of the Criteria into a SQL statement? If so providing a sql statement directly might do the job.

Is it the actual execution of the sql statement? If so, determining the sql statement resulting from the posted code would be the first thing.

I have never seen a real case where a stored procedure made things faster. Of course that doesn't mean such cases don't exists. But the optimizers of modern rdbms's are pretty smart.

So in order to get this started properly: Set up logging so you see every sql statement with a precise time stamp. As well as the start and end time of the complete process you are tuning. If this is about hundreds of executions, you'll have to agreggate stuff.

This will tell you if it is about the sql statements are executed and which is taking up much time and if it is an sql statement at all which is causing the problem.

Most of the time sql statements are guilty of bad performance, but one shouldn't jump to conclusions.


Update on the many names part:

You can use an InExpression: http://docs.jboss.org/hibernate/core/3.3/api/org/hibernate/criterion/InExpression.html to find multiple Objects in one go. This will be faster than single queries.

Jens Schauder
Sorry for not providing not enough infos it is a commercial project and I will try to give you the maximum information without making my boss angry. First: I cannot use the id instead. I though that I could use native sql queries or stored procedure? Would that improve performance?
Karussell
+2  A: 

In my case I know that the name is unique but adding an Index annotation to the name attribute didn't improve the performance. The bottleneck is that findByName method.

I won't believe this... until you show some numbers proving I'm wrong :) So:

  • Double check that the index was generated (check the DDL statements and the database). You'll need an index on this column for this query.
  • Check the query plan for the generated query (should be something like SELECT * FROM USER u WHERE u.NAME = 'foo') and the execution time.

Later, you might consider activating the second level cache and cache the query. But the database is the place to start with (caching things too early will just hide the real problem).

And measure things! If you cannot measure it, you cannot improve it. --Lord Kelvin.

Pascal Thivent
It might be possible, when the table is small, that the index doesn't get used, although it looks like a perfect match.And of course it is perfectly possible, that the OP added the annotation, but didn't change the schema :)
Jens Schauder
Thanks for your suggestion! I will now check if the index was created ...
Karussell
@Jens Schauder: How should I change the schema? (I was running drop + create Schema before my slow import session)
Karussell
@Jens True and that's why analyzing the query plan is required.
Pascal Thivent
@Karussell If you are using SchemaExport, I'd activate the logging of the DDL statement to see what is happening. You could also create the index manually for testing purposes.
Pascal Thivent