tags:

views:

63

answers:

2

Hey everyone

I am currently doing a research project, and I was given with about 400G of data. They are all pairs, and currently I am using mysql's load data into my db. I have looked into quite a lot of options into mysql's optimization for insertion, but it's taking far too long to complete (estimation is 118 days).

And then I looked into the best key,value store around: Tokyo Cabinet. I absolutely love it, it's fast it's light and it's simple. However, it doesn't allow identical keys to occur more than once. This DOES NOT fit my requirement.

( My data is something like <1, 2> <1, 3> <1, 4> <2, 9> ...etc )

Anyone here have something good to recommend? Preferably I want something that has an API for php (cause I will be using a web dev for my internal UI)

Something faster than mysql preferably (The only queries I will run is "SELECT c1 from table WHERE c2=X" or "SELECT c2 FROM table WHERE c1=X")

I have to insert all the data once in order to allow the selects to work. So insertion cannot take more than a week on my size of data.

(Distributed is ok, I have a cluster).

+1  A: 

Have you used LOAD DATA INFILE before? It's the fastest way to load data in MySQL. 118 days for 400GB sounds like you're iterating through doing a bunch of INSERT statements for every row...

Eric
I am using LOAD DATA IN FILE now
disappearedng
Ok I modified the sourcecode for mysql and recompiled my optimized version. Now it's still taking at least 18 days. Any good dbs to recommend?
disappearedng
A: 

Yeah 400 gigs of data is going to make MySQL beg for mercy. You should be using Oracle or Microsoft SQL.

Even then, once you get this data into a DB what do you plan to do with it? Each query is going to take several minutes.

Spencer Ruport
Well you see this is not commercial thing. Several minutes for a query is tolerable.
disappearedng