views:

89

answers:

4

Hi All,

We have a system which performs a 'coarse search' by invoking an interface on another system which returns a set of Java objects. Once we have received the search results I need to be able to further filter the resulting Java objects based on certain criteria describing the state of the attributes (e.g. from the initial objects return all objects where x.y > z && a.b == c).

The criteria used to filter the set of objects each time is partially user configurable, by this I mean that users will be able to select the values and ranges to match on but the attributes they can pick from will be a fixed set.

The data sets are likely to contain <= 10,000 objects for each search. The search will be executed manually by the application user base probably no more than 2000 times a day (approx). It's probably worth mentioning that all the objects in the result set are known domain object classes which have Hibernate and JPA annotations describing their structure and relationship.

Possible Solutions

Off the top of my head I can think of 3 ways of doing this:

  1. For each search persist the initial result set objects in our database, then use Hibernate to re-query them using the finer grained criteria.
  2. Use an in-memory Database (such as hsqldb?) to query and refine the initial result set.
  3. Write some custom code which iterates the initial result set and pulls out the desired records.

Option 1

Option 1 seems to involve a lot of toing and froing across a network to a physical Database (Oracle 10g) which might result in a lot of network and disk activity. It would also require the results from each search to be isolated from other result sets to ensure that different searches don't interfere with each other.

Option 2

Option 2 seems like a good idea in principle as it would allow me to do the finer query in memory and would not require the persistence of result data which would only be discarded after the search was complete. Gut feeling is that this could be pretty performant too but might result in larger memory overheads (which is fine as we can be pretty flexible on the amount of memory our JVM gets).

Option 3

Option 3 could be very performant but is something I would like to avoid as any code we write would require such careful testing that the time taken to acheive something flexible and robust enough would probably be prohibitive.


I don't have time to prototype all 3 ideas so I am looking for comments people may have on the 3 options above, plus any further ideas I have not considered, to help me decide which idea might be most suitable. I'm currently leaning toward option 2 (in memory database) so would be keen to hear from people with experience of querying POJOs in memory too.

Hopefully I have described the situation in enough detail but don't hesitate to ask if any further information is required to better understand the scenario.

Cheers,

Edd

A: 

How complex are the refining criteria? If the majority are quite simple, I'd be tempted to go for option (3) to start with, but make sure it's encapsulated behind a suitable interface so that if you come across something that is too complex or inefficient to code up yourself you can switch to the in-memory DB at that point (either wholesale for all queries, or just for the complex ones if there's an overhead in setting up the temporary tables).

pdbartlett
+1  A: 

Options 1 and 2 are quite compatible: by implementing one you can replace it with the other with simple reconfiguration of persistence.xml (given that in-memory database is JPA compatible, e.g. JavaDB, Derby, etc.).

Option 3 is re-implementing both third-party software (database) and your own code (existing JPA entities). You also listed its advantages as concerns. It's clearly a less feasible option in your case. I can't think of anything else to promote Option 3 either.

It seems that in-memory database is more suitable given use cases and their time span. If requirements evolve into less transient ones then you can switch to Oracle.

grigory
A: 

Option 2 seems to be good - since you can toggle between 1 & 2 as per need. 3 is restricted in terms of future data sizing issue as well. Querying objects would imply greater dependency on the code structure for storage and querying.

Probably it would be good idea to include some caching mechanism (ehcache/memcache) along with usage of Option 2 and then profiling to check the performance difference.

techzen
+1  A: 

If your expressions are not too complex, you can use an expression language for evaluating string queries on your Java objects (POJOs). I can recommend MVEL http://mvel.codehaus.org .

The idea is that you put your objects into MVEL context. Then you provide string query written according to MVEL simple notation, and finally evaluate expression.

Example taken from MVEL site:

Map vars = new HashMap();
vars.put("x", new Integer(5));
vars.put("y", new Integer(10));

Integer result = (Integer) MVEL.eval("x * y", vars);
assert result.intValue() == 50;  // Mind the JDK 1.4 compatible code :)

Usually expression languages support traversing your object graph (collections) and accessing members in JSP EL style (dot notation).

Also, I can suggest looking at OGNL (google it, I can't add more than one link)

leopoldkot