tags:

views:

297

answers:

3

I am using NHibernate for ORM and have consolidated the loading of lots of entities into one big query.

I am actually loading a word dictionary, around 500K entries, and each word relates to others. Running the loading process in the background could be very tricky in our application, as we would have to manually load an entry that has not been loaded on time, as any word could be asked for at any time. Our only requirements are that all the data be loaded as fast as possible. I also tried using a stateless session, but got an exception that stateless sessions can't fetch collections (for some reason, maybe it has to do with the fact there is no cache for stateless sessions?)

The problem is that although the query takes no more than 25 seconds in SQLServer, it takes well over 3 minutes for ICriteria.List().

I used NHProf to profile the loading process and found that the creation of the entities is a costly affair, which takes up most of the loading time in NHibernate.

Is there anything I could do to reduce this latency? Is the memory allocation expensive, or is it the "filling in" of the data?

Thanks!

A: 

Profiling the creation process (for example with the VS performance analyser) should tell you exactly what is the costly operation. If you have played already with lazy loading tuning then I think the only good solution is to encapsulate the returned list to enable paging an return smaller chunks in a few iterations. I am not sure whether NHibernate support lazy result lists like JPA does (i.e. not loading entities from data reader until needed).

tony.ganchev
+4  A: 

Perhaps you should consider the fact that NHibernate (like most ORMs) is not particularly suited (or intended) for these types of bulk-loading scenarios. How many rows are you trying to load, give or take? What are you trying to do? Pre-populate a cache? Do batch-like processing?

My gut feeling is that you should seriously consider the purpose of your app and choose the underlying technologies accordingly. Perhaps you can shed some light on your intentions/requirements?

EDIT OK, from your comments I understand what it is you're trying to do here. The first thing I'd do is create a simple prototype using raw ADO.NET to load the same data, to get a feel for the best performance attainable using standard data access and in-memory collections. Next, fiddle around with different collection types to see what performs well when populating and searching. If loading data like this is still too slow, it's time to start looking at other methods of loading the data: file-based from a local data file, hydrating pre-serialized objects, some form of fast on-demand loading, etc.

tijmenvdk
Thanks for the quick reply!I am trying to load more than 500K rows, each of which is an entity. Each entity can point towards other entities.I am trying to populate this list as fast as possible with all these inter relations. I am not writing anything to the DB, but after experimenting with lazy loading I found it doesn't suit my needs.The next best thing I could think of is somehow loading the data in a different thread in some sort of producer-consumer pattern, but I have no idea how you do that. Any ideas?Do you have any tips for other technologies?Thanks!
Ido Cohn
I still have no clue on why you'd want to load that many inter-related entities. That makes it hard to come up with suggestions. Loading the data in the background can make sense, but only if your app can start to do whatever it needs to do without all the data in place. In short, tell us what you want to achieve, what your requirements are, maybe then we can give you a proper suggestion or two.
tijmenvdk
I am actually loading a word dictionary, when each word relates to others.The background thing could be very tricky in our application, as we would have to manually load an entry that has not been loaded before. The problem is that any one of those words could be asked for at any time.Our only requirements are that all the data be loaded as fast as possible. I also tried using a stateless session, but got an exception that stateless sessions can't fetch collections...
Ido Cohn
I'd update your original question with this info, that way more people will read it.
tijmenvdk
+3  A: 

Loading 500k entities into an NHibernate session is not a good idea. The session is made to be short lived and hold a relatively small number of entities.

If you want to do this kind of batch processing in NHibernate you should take a look at the StatelessSession instead of the ordinary session. Using a stateless session would most likely drastically improve performance in this scenario. However, when using a stateless session you lose the benefits of the NHibernate first level cache, such as change tracking.

More information about the StatelessSession can be found in this article and in the NH docs at NHForge.

In this scenario I would also recommend that you consider using straight ADO.NET instead of NHibernate. I am not saying that you should switch you whole data access strategy to ADO.NET but you might want to consider using ADO.NET for the batch operations and using NHibernate for the other cases.

Erik Öjebo
Thanks.I tried using a stateless session, but I got the following error:SessionException: "collections cannot be fetched by a stateless session".Why should this be true? Is there any way to circumvent this?
Ido Cohn
If you really need load the entities with their child collections populated then StatelessSession is not a good fit either, since it ignores collections (according to the docs at NHForge). The reason for that is probably that the StatelessSession is implemented at a level much closer to ADO.NET than the ordinary session. It seems like ADO.NET would be a better choice than NHibernate in your scenario.
Erik Öjebo