ansaurus

Question

Fast way to replicate a huge database table using java

Answer 1

A:

SELECT * FROM YOUR_TABLE
Map results into an object or data structure
Assign a unique key for each object or data structure
Load the key and object or data structure into a WeakHashMap to act as your cache.

I don't see why you need sorting, because your cache should access values by unique key in O(1) time. What is sorting buying you?

Be sure to think about thread safety.

I'm assuming that this is a read-only cache, and you're doing this to avoid the constant network latency. I'm also assuming that you'll do this once on start up.

How much data per record? 12M records at 1KB per record means you'll need 12GB of RAM just to hold your cache.

duffymo 2010-07-15 00:19:58

isn't 12M records really not that many for a DBMS? I mean with indexing and other tricks...

hvgotcodes 2010-07-15 00:26:18

Franz See 2010-07-15 00:27:36

Actually, the problem is when you join it with other tables and do sorting for pagination. Our DBA has already optimized everything that could be optimized, but the amount of data to be sorted (for pagination) is just too big that it still takes 2 to 3 minutes per query.

Franz See 2010-07-15 00:30:05

So are you then going to replicate the other tables in your cache too? And the logic for joining and sorting? Sounds like you're on a slippery slope to implementing your own DBMS in Java...

David Gelhar 2010-07-15 01:01:53

I already have the other data. And I'm going to index them using lucene (because I need a search functionality).

Franz See 2010-07-15 01:04:29

Also, I don't have to store everything in memory. Just like how the database doesn't store everything in memory, the storage where I am going to replicate the data to doesn't store everything in memory as well.

Franz See 2010-07-16 06:38:52

Answer 2

A:

Replicating the data in a cache seems like replicating the functionality of the database.

From reading other comments, I see that you are not doing this to avoid network roundtrips, but because of costly joins. In many DBMS you can create temporary tables - like this:

CREATE TEMPORARY TABLE abTable AS SELECT * FROM a , b ;

If a and b are large (relatively permanent) tables, then you will have a one-time cost of 2-3 minutes to create the temporary table. However, if you use abTable for many queries, then the subsequent per query cost will be much smaller than

SELECT name, city, ... , FROM a , b ;

Other database systems have a view concept which lets you do something like this

CREATE VIEW abView AS SELECT * FROM a , b ;

Changes in the underlying a and b table will be reflected in the abView.

If you really are concerned about network round trips, then you may be able to replicate parts of the database on the local computer.

A good database management system should be able to handle your data needs. So why reinvent the wheel?

emory 2010-07-15 01:13:35

Pardon for the confusion again. I'm not reinventing a caching solution nor a searching solution. I just need to read the data (fast enough) from the database and store them in the cache that I'm using and index them with my searching solution. Also, although I could do the caching in the database, it would be preferable that whatever cache I'm using is horizontally scalable (which is why I'm trying to avoid the RDBMs for the caching).

Franz See 2010-07-15 01:20:31

Also, if I'm not mistaken, a VIEW (unlike a Materialized VIEW) is just like a shortcut of query which means the query associated to a view would still be executed. of course, it may be faster due to in-memory caching and less disk hit, but I don't think we can rely on that to have a consistent fast query.

Franz See 2010-07-15 01:21:43

ansaurus

tags:

views:

answers:

Fast way to replicate a huge database table using java

related questions