ansaurus

Question

Answer 1

+1 A:

By default, a join will not be done when the Employee entity is fetched from the datastore, it will only be done when Employee.position is actually read (this is called lazy loading).

Additionally, this second fetch can be avoided using the level 2 cache. First check that the level 2 cache is actually enabled (in DataNucleus 1.1 it is disabled by default, in 2.0 it is enabled by default). You should probably then "pin" the class so that the Position entities it will be cached indefinitely:

The level 2 cache can cause issues if other applications use the same database, however, so I would recommend only enabling it for classes such as Position which are rarely changed. For other classes, set the "cacheable" attribute to false (default is true).

EDITED TO ADD:

The <join> tag in your metadata is not suitable for this situation. In fact you don't need to specify the relationship explicitly at all, DataNucleus will figure it out from the types. But you are right when you say that you need POSITION_ID to be read in the default fetch group. This can all be achieved with the following change to your metadata:

<field name="position" default-fetch-group="true">
    <column name="POSITION_ID" jdbc-type="int" />
</field>

EDITED TO ADD:

Just to clarify, after making the metadata change descibed above I ran the test code which you provided (backed by a MySQL database) and I saw only these two queries:

SELECT 'com.example.staff.Position' AS NUCLEUS_TYPE,`THIS`.`ID`,`THIS`.`TITLE` FROM `POSITION` `THIS` FOR UPDATE
SELECT 'com.example.staff.Employee' AS NUCLEUS_TYPE,`THIS`.`ID`,`THIS`.`NAME`,`THIS`.`POSITION_ID` FROM `EMPLOYEE` `THIS` FOR UPDATE

If I run only the second part of the code (the Employee extent), then I see only the second query, without any access to the POSITION table at all. Why? Because DataNucleus initially provides "hollow" Position objects and the default implementation of Position.toString() inherited from Object doesn't access any internal fields. If I override the toString() method to return the position's title, and then run the second part of your sample code, then the calls to the database are:

SELECT 'com.example.staff.Employee' AS NUCLEUS_TYPE,`THIS`.`ID`,`THIS`.`NAME`,`THIS`.`POSITION_ID` FROM `EMPLOYEE` `THIS` FOR UPDATE
SELECT `A0`.`TITLE` FROM `POSITION` `A0` WHERE `A0`.`ID` = <2> FOR UPDATE
SELECT `A0`.`TITLE` FROM `POSITION` `A0` WHERE `A0`.`ID` = <1> FOR UPDATE

(and so on, one fetch per Position entity). As you can see, there are no joins being performed, and so I'm surprised to hear that your experience is different.

Regarding your description of how you hope caching should work, that is how the level 2 cache ought to work when a class is pinned. In fact, I wouldn't even bother trying to pre-load Position objects into the cache at application start-up. Just let DN cache them cumulatively.

It's true that you may have to accept some compromises if you adopt JDO...you'll have to relinquish the absolute control that you get with hand-rolled JDBC-based DAOs. But in this case at least you should be able to achieve what you want. It really is one of the archetypal use cases for the level 2 cache.

Todd Owen 2010-07-12 06:51:03

Thanks for the tip on Level 2 caching and pinning. I'm using 2.1.1, so L2 caching should be on. I'm not sure what the default type is, so I explicitly set it to "soft". I tried pinning each Position as I iterated over them. I'm still seeing a join when I ask for an Employee's Position.

2010-07-12 19:30:18

See my edit. I tested your code...this change should work.

Todd Owen 2010-07-13 01:39:14

Thanks. That helps, but it looks as though I can't do what I had wanted to do.

2010-07-13 21:15:06

Answer 2

+1 A:

Adding on to Todd's reply, to clarify a few things.

A <join> tag on a 1-1 relation means nothing. Well it could be interpreted as saying "create a join table to store this relationship", but then DataNucleus doesn't support such a concept since best practice is to use a FK in either owner or related table. So remove the <join>
A "table" on a 1-1 relation suggest that it is stored in a secondary table, yet you don't want that either, so remove it.
You retrieve Position objects, so it issues something like

SELECT 'org.datanucleus.test.Position' AS NUCLEUS_TYPE,A0.ID,A0.TITLE FROM "POSITION" A0

You retrieve Employee objects, so it issues something like

SELECT 'org.datanucleus.test.Employee' AS NUCLEUS_TYPE,A0.ID,A0."NAME" FROM EMPLOYEE A0

Note that it doesn't retrieve the FK for the position here since that field is not in the default fetch group (lazy loaded)

You access the position field of an Employee object, so it needs the FK retrieving (since it doesn't know which Position object relates to this Employee), so it issues

SELECT A0.POSITION_ID,B0.ID,B0.TITLE FROM EMPLOYEE A0 LEFT OUTER JOIN "POSITION" B0 ON A0.POSITION_ID = B0.ID WHERE A0.ID = ?

At this point it doesn't need to retrieve the Position object since it is already present (in the cache), so that object is returned.

All of this is expected behaviour IMHO. You could put the "position" field of Employee into its default fetch group and that FK would be retrieved in step 4, hence removing one SQL call.

DataNucleus 2010-07-13 06:07:18

Thanks for the explanations. I added the join tag to fix an error caused by the presence of the table attribute. Removing both works well. The documentation, while amazingly detailed and complete, doesn't fill the role a book might. I have both JDO books I could find, but they are so dated that they really don't help on some of this stuff. All persistence-related publishing seems to revolve around Hibernate these days.I'm still seeing a join (see my edit above). While that might be the behavior you expect, it isn't what I was expecting.

2010-07-13 19:11:03

Think about what options DataNucleus has.Option 1: If DN just retrieved the FK value and then checked the cache and didn't find the related object it would then have to issue another SQL to get the object.Option 2: If DN retrieved the FK value and the related object values (by join) then if it doesn't find the related object it has all values to instantiate the object itself (so no need for other SQL).We implemented it using option 2; can't win for everybody ;-)

DataNucleus 2010-07-13 19:19:44

As I stated, I understand why DataNucleus provides the naive fetch. What I was looking for was a way to hint an enlightened fetch. I now know that JDO doesn't provide that, and DataNucleus doesn't provide such a vendor-specific metadata extension (which would be Option 3).

2010-07-13 21:02:33

DataNucleus SVN actually allows a metadata extension "fetch-fk-only" (set it to true) and this fetches the FK of the related object only, hence does no join. This is obviously only of use with 1-1 UNI, or 1-1 BI (non-owner) relations - where the FK is in the table of the selected class. Likely to become part of standard JDO at some point, but not for a while

DataNucleus 2010-07-14 09:59:21

Thanks. I'll try out this new code.

2010-07-18 20:38:34

ansaurus

tags:

views:

answers:

Avoid DataNucleus joins?

related questions