views:

87

answers:

1

I am experienced with Spring, but new to Spring Batch. Now I have the task to migrate a data structure from a simple structure in one database to a complexer one in the other. The data structure corresponds to an object hierarchy that I will name like this

OldParent 1 --> n OldChild // old system

NewParent 1 --> n NewChild // new system

In the old db, there are only two tables, in the new system, things get a lot more complex and there are 8 tables, but that is irrelevant for now.

Basically I would like to use a simple JDBC-based solution with rowmappers reading from OldParent and converting to NewParent.

So here would be a basic configuration snippet:

<batch:job id="migration">
    <batch:step id="convertLegacyData">
        <batch:tasklet>
            <batch:chunk
                reader="parentReader"
                writer="parentWriter"
                commit-interval="200" />
        </batch:tasklet>
    </batch:step>
</batch:job>

In this scenario, the parentReader would acquire and convert the OldChild objects, probably delegating to a childReader / childWriter objects.

The problem is this: while there are several hundred thousand Parents, each Parent can have zero to several million children, so the commit-interval based on parent would not help at all, but I would very much like to have a configurable commit interval.

So another solution would be to make the workflow child-based:

<batch:job id="migration">
    <batch:step id="convertLegacyData">
        <batch:tasklet>
            <batch:chunk
                reader="childReader"
                writer="childWriter"
                commit-interval="200" />
        </batch:tasklet>
    </batch:step>
</batch:job>

In this scenario, the childReader would have to also read OldParent objects and write NewParents, delegating to parentReader and parentWriter objects. The major drawback here is that I am losing all OldParents that don't have associated OldChild objects.

The third possible scenario would be to have two different workflows for OldParent -> NewParent and OldChild -> NewChild. (I would have to maintain a mapping table that stores the relationship between OldParent and NewParent ids, but I could use standard configurations including commit-interval.

Are there other possibilities? Which of these would you recommend as best practice?

A: 

Doesn't it have a N-records commit-interval configuration? Doesn't it uses something like BatchUpdates (JDBC) so you can configure N-sized batch updates and a commit for each batchupdate.

If it doesn't I have a hack :)

Make your own java.sql.Connection implementation. One that passes all the commands to the original connection and plus, executes a commit after each N-th update... :)

If you're using a DatabasePool you can wrap the original too, to return a wrapped connection with the hack.

I know it's a little weird proposition... but maybe it's all you need for a one-time migration.

helios
While your 3rd paragraph is certainly a way to do things when no standard configuration option exists, I would like to keep with spring batch standards and use the commit-interval property where I can, so I am looking for a solution that can be classified as a spring batch best practice, not a hack (even if I agree that it might do for a one-time task).
seanizer