I've got a db2 ese 9.7 non-dpf data warehouse using data compression with 20 TB of data that gets 100 million rows a day via loads every 10 minutes and gets another million a day via 50,000 imports every day. Additionally, there is a small amount of transactional data associated with the other two large sets of data.
Currently, we're using application-level backups - and rely on loading previously exported summary tables or reloading the 100 million rows a day in case of a recovery. But for the sake of the small amount of transactions and the imports - I'd like online backups.
However, it appears that online tablespace-specific backups require an initial offline backup. And this is the problem, even if I can redirect the off-line backup to /dev/null, an offline backup will take approximately 48 hours of downtime. Which is unacceptable. And may be required again at some point in the future.
At some point we'll probably be splitting this up into 8+ partitions and that would help both this and load index builds. But that may not happen for a while, and it's hard to justify for tasks that shouldn't be necessary in the first place.
EDIT: The reason that we didn't initially go with DPF, and why it's not a driving issue for our queries is that over 99% of our queries hit summary tables, and the 1% that does have to dip into the table with 3+ billion rows per month can almost always take advantage of table partitioning, MDC and indexes in order to only scan a much smaller amount. What this gets to is that the traditional heuristics regarding how much data per CPU don't always apply.
Any way to get around that offline backup requirement? Any third-party tools that can help me out? Any other suggestions?