views:

80

answers:

5

I am working on developing a webapp (visual jsf, ejb3, hibernate on glassfish/mysql) that works with an existing legacy system.

I have an 'employee' table (with fields such as name (String), location (String), etc.) which is separate from an employee table on the legacy db2 side. I do all of the webapp processing with my employee table. However, every week I need to schedule a task to go through all the employees in my table and compare them against the employees in the legacy db2 table. If the employee location has changed in the legacy table, I need to update my employee table to reflect the new location.

What would you suggest as the best way to go about doing this?

Currently I am reading in all the employees into an ArrayList and then looping through each employee entity in the list, getting the corresponding legacy employee instance, comparing locations and updating my employee entity if location change detected.

Since I have close to 50000 records in my employee table, the initial build of the ArrayList takes around 5 minutes and this employee number will only keep on increasing.

A: 

Is there a reason why it should be synched only once in a week? If not, you might want to spread the operation over the week - do 1/7-th of the work every day. You can also consider adding a table in your side to keep track of which record was synched when.

binil
that is a thought. not sure how i would go about doing that as of now. if nothing else works out, i guess i could do that. and i am using a location_history kind of table to keep track of the location changes. thanks.
SibzTer
poor man's batching (sqlite dialect)... select idfrom employeewhere id % 7 = cast(strftime('%w', 'now') as integer);
trenton
A: 

I would create a dblink ( dblinks do exist on DB2 right? ) and do something like:

 select 
     a.id, a.location 
 from 
      empl a, empl@link b 
 where 
     a.id = b.id 
     and a.location <> b.location

Then iterate the resultset which will have all those whose location have changed.

OscarRyz
A: 

If you have the ability to alter the legacy table in any way, you could add a needs_sync column to it. Then, using a trigger or modifying the code that updates the location, set needs_sync = 1 when you do the update. (Add an index on that column, too.)

Then, to find records to update

select id, location
from legacy.employee
where needs_sync = 1

When you've successfully done the sync

update employee
set needs_sync = 0
where needs_sync = 1

Do it all in a transaction to avoid a race condition.

This solution has the advantage of only examining records which have been changed, so it will be efficient at runtime. It does require a change in the legacy schema, which might be painful or impossible to do.

trenton
unfortunately, i cannot modify the legacy schema in any way. but thanks for the reply though
SibzTer
A: 

Im thinking of using jpa query's "setMaxResults()" and "setFirstResults()" methods to retrieve employee data in chunks. These methods are used for paginating display data in the UI, so I dont see any reason why I cant do the same. This way I can process chunks at a time. And I could probably throw in a queue and mdb for processing the chunks in parallel since I cant create threads within the ejb container.

SibzTer
not sure if this is going to work or not. someone pointed out to me that setMaxResults() and setFirstResults() methods might not be supported properly by db2. need to check that out.
SibzTer
how about if i just manually build the chunks?
SibzTer
A: 

I am thinking of using JMS messages, queues and MDBs to try and solve this problem instead. I would send each employee record as a separate message to a queue and then, the corresponding MDB can do all the processing and updating for that record. I am thinking I might get more simultaneous multiprocessing done that way.

SibzTer