I am working on a project for a company, where I need to migrate its previous data into a new system. Obviously both structures are totally different and I'm using hibernate for database manipulation and confused as to what would be the best approach to migrate data. Should I use hibernate, should I write stored procedures, or is there another option? Any bright ideas are welcome.
While I am not sure about "best", I would prefer to use Hibernate or a similar ORM if I was in your position. The reason is that then you have an object hierarchy to use between the two databases. If the schemas are extremely similar, then a simply SQL script might be easier. It really depends on what your situation is, and the specifics of it.
edit: I really need some morning caffeine...
What I was attempting to note, was noted by the two people who posted after me. If the schemas are defined, you already have objects or are good with the reverse engineering tools, then hibernate is an easy way to go. However, if it is a one time thing, scripts are definitely the better way to go.
Having tried both approaches in the past, I can definitely say that this is not a scenario for which ORM was designed, nor one where it flourishes. You end up having to build two different sets of objects, and it's difficult to gain the efficiency required for a mass migration. The only reason I can think of for using something like hibernate would be if you were building a system which was going to permanently sit in between two systems to integrate them, but it sounds like this is relatively short-term.
I have been extremely happy with the results of a python script, a bunch of SQL and some python objects to transform the data.
Hibernate does not play very well with stored procedures - its not a 'natural fit'. If your ORM is already mapped out , and you have your 'as-is' and 'to-be' objects setup, use normal hibernate methods to write data to your new layout.
If you are forced to go the StoredProcedure way, then you can decide if you want to swallow the pain, and code your transforms in the stored procedures - that way all your migration scripts will stay together.
Like the other poster said, a scripting language like Python can serve you well here - it has worked well for me too.
Personally I would probably use am ETL tool like SSIS (if you are going from or to SQL Server) for this, movement of large amounts of data is what ETl tools are designed and optimized to do.
I've tried to manage data migration as I change my code and my data representation through various releases. Each time, I've ended up writing specific sql to query for objects in the old state, and to populate new columns. If there's a straightforward way to manage data migration while viewing everything as an object, I haven't thought of it, and so far, the new columns have always have simple interpretations that I could calculate in sql.
Some of them have been simple enough that the code has ended up in java, and other changes have been complex enough that I wanted multiple sql statements, and so I ended up embedding them in shell and python (both, for portability) scripts. The scripts are here and [the java code][2] with method names like updateDB2008_4().