ansaurus

Question

How do I persist data to disk, and both randomly update it, and stream it efficiently back into RAM?

Answer 1

A:

I think you'd have a lot more success writing something that caches the most active records in memory and queues data changes as a low priority insert into the DB.

I understand there's a slight increase in IO using this method but if you're talking about millions of records I think it would still be faster because any search algorithm you create is going to be greatly outperformed by a a full fledged database engine.

Spencer Ruport 2009-08-13 21:14:43

Answer 2

A:

You could try Berkley DB which is now owned by Oracle. They have Open Source and Commercial licenses. It uses a Key/Value model (with an option to create indexes if other forms of queries are required). There is a pure Java version and a native version with Java bindings.

Michael Barker 2009-08-13 21:17:21

I hope I can find something free, unfortunately Berkeley DB is not unless I'm willing to GPL my code, which isn't an option.

sanity 2009-08-14 03:43:09

Answer 3

A:

One trick you could use is to improve throughput with an RDBMS is to batch your database insert / updates. For instance, I system I worked on did inserts/updates in batches of up to 3,000. The down side is that the SQL needs to be be more complex and needs to be tuned.

Stephen C 2009-08-13 21:35:21

OK down-voter, why is this a bad idea? I can assure you that it did work.

Stephen C 2009-08-14 04:21:40

Answer 4

A:

http://www.zentus.com/sqlitejdbc/

SQLite database (public domain), JDBC connector with BSD license, native for a whole bunch of platforms (OSX, Linux, Windows), emulation for the rest.

Marian 2009-08-13 21:45:42

Answer 5

+2 A:

How about H2? The License should work for you.

You can use H2 for free. You can integrate it into your application (including commercial applications), and you can distribute it.
Files containing only your code are not covered by this license (it is 'commercial friendly').
Modifications to the H2 source code must be published.
You don't need to provide the source code of H2 if you did not modify anything.

I get

1000000 insert in 22492ms (44460.252534234394 row/sec)

100000 updates in 9565ms (10454.783063251438 row/sec)

from

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.Random;


/**
 * @author clint
 *
 */
public class H2Test {

  static int testrounds = 1000000;

  public static void main(String[] args) {
    try {
      Class.forName("org.h2.Driver");

    Connection conn = DriverManager.
        getConnection("jdbc:h2:/tmp/test.h2", "sa", "");
    // add application code here
    conn.createStatement().execute("DROP TABLE IF EXISTS TEST");
    conn.createStatement().execute("CREATE TABLE IF NOT EXISTS TEST(id INT PRIMARY KEY, browser VARCHAR(64),ip varchar(16), outcome real)"); 
    //conn.createStatement().execute("CREATE INDEX IDXall ON TEST(id,browser,ip,outcome");


    PreparedStatement ps = conn.prepareStatement("insert into TEST (id, browser, ip, outcome) values (?,?,?,?)");
    long time = System.currentTimeMillis();
    for ( int i = 0; i < testrounds; i++ ) {
      ps.setInt(1,i);
      ps.setString(2,"firefox");
      ps.setString(3,"000.000.000.000");
      ps.setFloat(4,0);
      ps.execute();
    }
    long last = System.currentTimeMillis() ;
    System.out.println( testrounds + " insert in " + (last - time) + "ms (" + ((testrounds)/((last - time)/1000d)) + " row/sec)" );

    ps.close();
    ps = conn.prepareStatement("update TEST set outcome = 1 where id=?");
    Random random = new Random();
    time = System.currentTimeMillis();

    /// randomly updadte 10% of the entries
    for ( int i = 0; i < testrounds/10; i++ ) {
      ps.setInt(1,random.nextInt(testrounds));
      ps.execute();
    }

    last = System.currentTimeMillis();
    System.out.println( (testrounds/10) + " updates in " + (last - time) + "ms (" + ((testrounds/10)/((last - time)/1000d)) + " row/sec)" );

    conn.close();

    } catch (ClassNotFoundException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    } catch (SQLException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }

}

Clint 2009-08-13 22:19:25

Answer 6

A:

I'd also take a look to see if there's anything existing based on either EHCache or JCS that might help.

Gwyn Evans 2009-08-13 22:38:01

Answer 7

A:

You can use Apache Derby (or JavaDB) which is bundled with JDK. However, if a DBMS doesn't provide the required speed you may implement a specific file structure yourself. If just exact key lookup is required, you may use a hash-file to implement it. The hash-file is the fastest file structure for such requirements (much faster than general purpose file structures such as B-Trees and grids which are used in DBs). It also provides acceptable streaming efficiency.

2009-08-14 00:59:37

Answer 8

+1 A:

JDBM is a great embedded database for Java (and not as encumbered with licensing as the Java version of Berkley). It would be worth trying. If you don't need ACID guarantees (i.e. you are OK with the database getting corrupted in the event of a crash), turn off the transaction manager (significantly increases speed).

Kevin Day 2009-08-14 03:09:47

Answer 9

A:

In the end I decided to log the data to disk as it comes in, and also keep it in memory where I can update it. After a period of time I write the data out to disk and delete the log.

sanity 2009-08-27 18:53:37

Answer 10

A:

Have you taken a look at Oracle's 'TimesTen' database? Its an in-memory db that is supposed to be very high-performance. Don't know about costs/licenses, etc, but take a look at Oracles site and search for it. Eval download should be available.

andora 2009-09-02 17:38:52

ansaurus

tags:

views:

answers:

How do I persist data to disk, and both randomly update it, and stream it efficiently back into RAM?

related questions