views:

951

answers:

3

My requirement is to read some set of columns from a table. The source table has many - around 20-30 numeric columns and I would like to read only a set of those columns from the source table and keep appending the values of those columns to the destination table. My DB is on Oracle and the programming language is JDBC/Java.

The source table is very dynamic - there are frequent inserts and deletes happen on it. Whereas at the destination table, I would like to keep the data for at least 30 days. My Setup is described as below - Database is Oracle. Number of rows in the source table = 20 Million rows with 30 columns Number of rows in destinationt table = 300 Million rows with 2-3 columns

The columns are all Numeric.

I am thinking of not doing a vanilla JDBC connection open and transfer the data, which might be pretty slow looking at the size of the tables. I am trying to take the dump of the selected columns of the source table using some sql like -

SQL> spool on
SQL> select c1,c5,c6 from SRC_Table;
SQL> spool off

And later use SQLLoader to load the data into the destination database.

The source table is storing time series data and the data gets purged/deleted from source table within 2 days. Its part of OLTP environment. The destination table has larger retention period - 30days of data can be stored here and it is a part of OLAP environment. So, the view on source table where view selects only set of columns from the source table, does not work in this environment. Any suggestion or review comments on this approach is welcome.

EDIT My tables are partitioned. The easiest way to copy data is to exchange partition netween tables

*ALTER TABLE <table_name>
EXCHANGE PARTITION <partition_name>
WITH TABLE <new_table_name>
<including | excluding> INDEXES
<with | without> VALIDATION
EXCEPTIONS INTO <schema.table_name>;*

but since my source and destination tables have different columns so I think exchange partition will not work.

+1  A: 

The problem seems a little vague, and frankly a little odd. The fact that there's hundreds of columns in a single table, and that you're duplicating data within the database, suggests a hosed database design.

Rather than do it manually, it sounds like a job for a trigger. Create an insert trigger on the source table to copy columns to the destination table just after they're inserted.

Another possibility is that since it seems all you want is a slice of the data in your original table, rather than duplicating it, a cardinal sin of database design, create a view which only includes the columns and ranges you want. Then just access that view like any other table.

I'm willing the guess that the root of the problem is accessing just the information you want in your source table is too slow. This suggests you might be able to fix that with better indexing. Also, your source table is probably just too damn wide.

Since I'm not an Oracle person, I leave the syntax of this as an exercise for the reader, but the concept should be sound.

Schwern
Well it is not a fucked design. Its a concept thats used when one trasfer data from OLTP environment to OLAP environment. I am storing time series data and the source table has retention of 2 days - so after 2 days data is gone - hence the View type of design does not work.
Shamik
Shamik.... TOTALLY AGREE! If copying data were a cardinal sin, we'd not need ETL or Replication tools. the REAL Cardinal Sin is trying to create complex reports against OLTP databases. Those tend to be huge. Huge SQL, Huge performance hogs.
Wait, are you pushing data between two different *databases*?
Schwern
Even if it's not, even if they were OLAP and OLTP sets of tables in the same database, it's still not HOSED. You need to allow for the possibility that even those things which are 75% incorrect can still be used to great efficiency some of the time.
@Mark I beg ignorance of OLAP/OLTP. A lot of detail was missing from the post when I responded. If the OP is stuck with some fixed schema, that's one thing, but when someone presents a solution which involves duplicating data and very wide table, I immediately take a skeptical eye to the schema.
Schwern
A: 

On a tangential note, you might want to look at Oracle's partitioning here and here.

Partitioning enables tables and indexes to be split into smaller, more manageable components and is a key requirement for any large database with high performance and high availability requirements. Oracle Database 11g offers the widest choice of partitioning methods including interval, reference, list, and range in addition to composite partitions of two methods such as order date (range) and region (list) or region (list) and customer type (list).

  • Faster Performance—Lowers query times from minutes to seconds
  • Increases Availability—24 by 7 access to critical information
  • Improves Manageability—Manage smaller 'chunks' of data
  • Enables Information Lifecycle Management—Cost-efficient use of storage

Partitioning the table into daily partitions would make archiving easier as described here

toolkit
Lowers **SOME** queries from minutes to seconds. Queries not using the partition key in the where clause will go in the opposite directions.
Noted. Cheers Mark.
toolkit
My tables are partitioned. The easiest way to copy data is to exchange partition but since my source and destination tables have different columns so I think exchange partition woould not work. Any comments?
Shamik
That's correct. you can only exchange if EVERYTHING is identical... indexes, compression, not tablespace... etc.
+1  A: 

Shamik, okay, you're loading an OLAP database with OLTP data.

What's the acceptable latency? Does your OLAP need today's data before people come in to the office tomorrow morning, or is it closer to real time.

Saying the Inserts are "frequent" doesn't mean anything. Some of us are used to thousands of txns/sec - to others 1/sec is a lot.

And you say there's a lot of data. Same idea. I've read people's post where they have HUGE tables with a couple million records. i have table with hundreds of billions of records. SO again. A real number is very helpful.

Do not go with the trigger suggested by Schwern. If you believe your insert volume is large, it means you've probably have had issues in that area. A trigger will just make it worse.

Oracle provide lots of different choices for getting data from OLTP to OLAP. Instead of reinventing the wheel, use something already written. Oracle Streams was BORN to do this exact job. You can roll your own streams with using Oracle AQ. You can capture inserted rows without a trigger by using either Database Change Notification or Change Data Capture.

This is an extremely common problem, which is why I've listed 4 technologies designed to solve it.

Advanced Queuing Streams Change Data Capture Database Change Notification

Start googling these terms and come back with questions on those. you'll be better off than building your own from the ground up or using triggers.

Hi Mark, thanks for the comments. I was wondering whether straming is an overkill for just one table. But I would do more research on this.
Shamik
The better question is, is it overkill to rebuild streaming for just one table. You probably think it's overkill because you've not used it. It's just some calls to some packages. But if you want to roll your own, start with AQ. That's very easy to setup.