views:

47

answers:

2

I have a table with 252759 tuples. I would like to use DataSet object to make my life easier, however when I try to create a DataSet for my table, after 3 seconds, I get java.lang.OutOfMemory.

I have no experience with Datasets, are there any guidelines how to use DataSet object for big tables?

+1  A: 

Why not start with giving the JVM more memory?

java -Xms<initial heap size> -Xmx<maximum heap size>

252759 tuples doesn't sound like anything a maching with 4GB RAM + some virtual memory couldn't handle in memory.

ammoQ
Is it possible to make groovy retrieving data lazily? Because memory-increasing does not scale very well.
Skarab
If you want to do that, you have to use normal JDBC. The way groovy does it, i.e. copying the whole result set into a arraylist, is not suitable for lazy retrieving, because Groovy could never know when it is save to close the underlying resultset, since there is no explicit close() method in the list; so it would have to leave it open until garbage collection (which might not happen anytime soon), thus sucking up resources on the database server.
ammoQ
Thank you, I see that I did not understand DataSet API. In my case, records in the table contains textual data and 4GB is not enough, so I will move back to JDBC. If I have time I plan to take a look also on GORM (Groovy ORM) that is a part of Grails.
Skarab
+1  A: 

Do you really need to retrieve all the rows at once? If not, then you could just retrieve them in batches of (for example) 10000 using the approach shown below.

def db = [url:'jdbc:hsqldb:mem:testDB', user:'sa', password:'', driver:'org.hsqldb.jdbcDriver']

def sql = Sql.newInstance(db.url, db.user, db.password, db.driver)
String query = "select * from my_table where id > ? limit 10000"

Integer maxId = 0

// Closure that executes the query and returns true if some rows were processed
Closure executeQuery = {

    def oldMaxId = maxId 
    sql.eachRow(query, [maxId]) { row ->

         // Code to process each row goes here.....
         maxId = row.id
    }
    return maxId != oldMaxId 
}


while (maxId == 0 || executeQuery() {}

AFAIK limit is a MySQL-specific feature, but most other RDBMS have an equivalent feature that limits the number of rows returned by a query.

Also, I haven't tested (or even compiled) the code above, so handle with care!

Don
I wanted to use DataSet to avoid using raw sql.
Skarab
If you want to take this approach you must use `Sql`. It doesn't seem possible to use non-standard SQL features like `limit` with `DataSet`.
Don
@Skarab: Usually, it is a bad idea to pull lots of data out of your database server to perform operations on just to avoid sql... That data has to be sent over the network wire which is much slower than just dealing with it in the db server. So, in effect, you are adding potentially crippling performance problems to your application by doing things like that.
Chris Lively
@Chris: I create an index on a local disk so I need to process a large number of tuples outside of my database. The core of my problem was the fact that I was not aware that groovy implementation of iterators puts first data into memory. I thought that it works in "stream" mode. Simply I did not understand the hidden part of API ;) and I thought that I did sth wrong.
Skarab