views:

57

answers:

2

I have a scenario where my application has access to a session for limited time windows, during which it must fetch data from the database into memory, and then only use the in-memory data to serve requests.

The data model is a simple one-to-many association such as:

<class name="com.foo.Road" table="road">
    <id name="oid" column="oid"/>

    <map name="carCountMap" fetch="subselect">
        <key column="road_oid" foreign-key="oid"/>
        <index column="time_oid" type="long" />
        <one-to-many class="com.foo.CarCount" />
    </map>

    <map name="truckCountMap" fetch="subselect">
        <key column="road_oid" foreign-key="oid"/>
        <index column="time_oid" type="long" />
        <one-to-many class="com.foo.TruckCount" />
    </map>
</class>

Now suppose that the car and truck count data exists for several years, which is much more than can fit in memory. Furthermore, I'm only really interested in loading the car counts in the last 3 months.

My question is, what is the best way to load this data using hibernate such that:

  • road.getCarCountMap() returns the set of only the car counts in the last 3 months (may be empty)
  • I don't end up with some crazy cartesian product that takes ages to process
  • No LazyInitializationExceptions are thrown after I close the session

Some things I've tried are:

1. Make the carCountMap collection eager and specify a where attribute on the mapping such as:

<map name="carCountMap" fetch="subselect" lazy="false" where="time_oid > 1000"> (similar for truckCountMap)

This fits in best with the collection semantics that I want, but unfortunately it forces me to hardcode a value, so I can't really refer to the last 3 months. time_oid increases by 1 every day.

2. Define maps as lazy and use an hql query to manually join the 3 tables:

    from Road r
    left outer join fetch r.carCountMap ccm
    left outer join fetch r.truckCoutnMap tcm
    where (ccm.time.oid > :startDate)
      or (tcm.time.oid > :startDate)

The problem with this is that the resulting query returns several millions of rows, whereas it should be 10k roads * 4 measurements per month (weekly) * 3 months = ~120k. This query completes in about an hour, which is ridiculous as approach #1 (which loads the exact same data as far as I'm concerned) completes in 3 minutes.

3. Define maps as lazy and load the roads first with a criteria, then run additional queries to fill in the collection

    List roadList = session.createCriteria(Road.class).list();

    session.getNamedQuery("fetchCcm").setLong("startDate", startDate).list();
    session.getNamedQuery("fetchTcm").setLong("startDate", startDate).list();

    return roadList;

This fires the right queries, but the retrieved car and truck counts don't get attached to the Road objects in roadList. So when I try to access the counts on any Road object, I get a LazyInitializationException.

4. Define maps as lazy, use criteria.list() to load all roads, iterate through all measurement dates in the last 3 months so force those values to be loaded.

I haven't tried this yet because it sounds really clunky, and I'm not convinced it'll get rid of the LazyInitializationException

  • Are there any workarounds to the problems I've run into with these methods?
  • Is there a better method altogether?
+1  A: 

I think your problem actually consists of two parts:

  • How to express subsets of data in your domain model
  • How to fetch the data with required level of performance

Regarding the first part, I think you are abusing your domain model when you try to load subsets of data into fields of Road.

Perhaps it would be better to make relationships between Roads and traffic measurements unidirectional, i.e. remove these maps from Road class. It looks reasonable since you probably don't need all this data at once. Then you can create a DTO (not mapped!) RoadStatistics consisting of Road and these traffic maps, and populate it with any data you want.

What's about the second part of the problem, I think you need to carry out some experiments with pure SQL in order to optimize your queries, and then translate the optimal query into HQL or Criteria. This translation can be done easily if your domain model doesn't restrict the way you load the data (see part one). Perhaps you will need to optimize your database schema by creating some indexes and so on.

axtavt
Thanks for the answer. Loading the statistics separately and then mapping somehow in memory is also an option I hadn't tried. It makes a lot of sense! I think the `<filter>` solution might just work for me though. Do you think that solution still smells of abuse?
oksayt
@oksayt: Solution with filter looks feasible if it provide the required level of performance (i.e. you don't need to optimize the query manually).
axtavt
+2  A: 

After digging around some more, it looks like hibernate filters are the exact solution I needed for this.

They basically provide a construct to have a where attribute on a collection or class, with parameters bound at runtime.

In the mapping file, define the filter and attach it to the collections:

<class name="com.foo.Road" table="road">
    <id name="oid" column="oid"/>

    <map name="carCountMap" fetch="subselect">
        <key column="road_oid" foreign-key="oid"/>
        <index column="time_oid" type="long" />
        <one-to-many class="com.foo.CarCount" />
        <filter name="byStartDate" condition="time_oid > :startDate" />
    </map>

    <map name="truckCountMap" fetch="subselect">
        <key column="road_oid" foreign-key="oid"/>
        <index column="time_oid" type="long" />
        <one-to-many class="com.foo.TruckCount" />
        <filter name="byStartDate" condition="time_oid > :startDate" />
    </map>
</class>

<filter-def name="byStartDate">
    <filter-param name="startDate" type="long"/>
</filter-def>

Then in the dao, enable the filter, bind the parameter and run the query:

session.enableFilter("byStartDate").setParameter("startDate", calculatedStartDateOid);
return session.createCriteria(Road.class).list();
oksayt