I am experienced with Spring, but new to Spring Batch. Now I have the task to migrate a data structure from a simple structure in one database to a complexer one in the other. The data structure corresponds to an object hierarchy that I will name like this
OldParent 1 --> n OldChild // old system
NewParent 1 --> n NewChild // new system
In the old db, there are only two tables, in the new system, things get a lot more complex and there are 8 tables, but that is irrelevant for now.
Basically I would like to use a simple JDBC-based solution with rowmappers reading from OldParent and converting to NewParent.
So here would be a basic configuration snippet:
<batch:job id="migration">
<batch:step id="convertLegacyData">
<batch:tasklet>
<batch:chunk
reader="parentReader"
writer="parentWriter"
commit-interval="200" />
</batch:tasklet>
</batch:step>
</batch:job>
In this scenario, the parentReader would acquire and convert the OldChild objects, probably delegating to a childReader / childWriter objects.
The problem is this: while there are several hundred thousand Parents, each Parent can have zero to several million children, so the commit-interval based on parent would not help at all, but I would very much like to have a configurable commit interval.
So another solution would be to make the workflow child-based:
<batch:job id="migration">
<batch:step id="convertLegacyData">
<batch:tasklet>
<batch:chunk
reader="childReader"
writer="childWriter"
commit-interval="200" />
</batch:tasklet>
</batch:step>
</batch:job>
In this scenario, the childReader would have to also read OldParent objects and write NewParents, delegating to parentReader and parentWriter objects. The major drawback here is that I am losing all OldParents that don't have associated OldChild objects.
The third possible scenario would be to have two different workflows for OldParent -> NewParent
and OldChild -> NewChild
. (I would have to maintain a mapping table that stores the relationship between OldParent and NewParent ids, but I could use standard configurations including commit-interval.
Are there other possibilities? Which of these would you recommend as best practice?