views:

359

answers:

3

I need to maintain a large list of python pickleable objects. The list is too large to be all stored in the RAM, so some database\paging mechanism is required. I need that the mechanism will support fast access for close (nearby) areas in the list.

The list should implement all the python-list features, but most of the time I will work sequentially: scan some range in the list and while scanning decide if I want to insert\pop some nodes in the scanning point.

The list can be very large (2-3 GB), and should not be all contained in the RAM at once. The nodes are small (100-200 bytes) but can contain various types of data.

A good solution for this could be using a BTree, where only the last accessed buckets are loaded in the RAM.

Using SQL tables is not good, since I'll need to implement a complex index-key mechanism. My data is not a table, its a simple python list, with the feature of adding elements in specific indexes, and popping elements from specific positions.

I tried ZODB and zc.blist, which implement a BTree based list that can be stored in a ZODB database file, but I don't know how to configure it so the above features will run in reasonable time. I don't need all the multi-threading\transactioning features. No one else will touch the database-file except for my single-thread program.

Can anyone explain me how to configure the ZODB\zc.blist so the above features will run fast, or show me a different large-list implementation?

Some quick&dirty code that I tried:

import time
import random

NODE_JUMP = 50000
NODE_ACCESS = 10000

print 'STARTING'


random_bytes = open('/dev/urandom', 'rb')

my_list = list()

nodes_no = 0

while True:
    nodes_no += NODE_JUMP
    start = time.time()
    my_list.extend(random_bytes.read(100) for i in xrange(NODE_JUMP))
    print 'extending to %s nodes took %.2f seconds' % (nodes_no, time.time() - start)

    section_start = random.randint(0, nodes_no -NODE_ACCESS -1)
    start = time.time()
    for index in xrange(section_start, section_start + NODE_ACCESS):
        # rotate the string
        my_list[index] = my_list[index][1:] + my_list[index][0]

    print 'access to %s nodes took %.2f seconds' % (NODE_ACCESS, time.time() - start,)

Print ended with:

extending to 5000000 nodes took 3.49 seconds
access to 10000 nodes took 0.02 seconds
extending to 5050000 nodes took 3.98 seconds
access to 10000 nodes took 0.01 seconds
extending to 5100000 nodes took 2.54 seconds
access to 10000 nodes took 0.01 seconds
extending to 5150000 nodes took 2.19 seconds
access to 10000 nodes took 0.11 seconds
extending to 5200000 nodes took 2.49 seconds
access to 10000 nodes took 0.01 seconds
extending to 5250000 nodes took 3.13 seconds
access to 10000 nodes took 0.05 seconds
Killed (not by me)
A: 

I think the ZODB is the tool to use. It will store lots of arbitrary items, it handles memory issues.

Here is a working example, and in this case I included objects that reference each other as well as being stored in a BTree by number.

import random
from collections import deque

import ZODB
from ZODB.FileStorage import FileStorage
from ZODB.DB import DB
import transaction
import persistent
import BTrees

def random_string(n=100):
    return ''.join([chr(random.randint(0,95)+32) for i in xrange(n)]) 


class Node(persistent.Persistent):
   def __init__(self, value=None):
       if not value:
           self.value =  random_string()

   def setNeighbors(self, refs):
       self.p1 = refs[0]
       self.p2 = refs[1]
       self.p3 = refs[2]
       self.p4 = refs[3]


def getTree():
    storage = FileStorage('c:\\test.zdb')
    db = DB(storage)
    connection = db.open()
    root = connection.root()
    if root.has_key('tree'):
        tree = root['tree']
    else:
        tree = BTrees.OOBTree.OOBTree()
        root['tree'] = tree
        transaction.commit()
    return tree


def fillDB():
    tree = getTree()

    # start with some initial objects.
    nodes = deque([Node(), Node(), Node(), Node()])
    node = Node()

    for n in xrange(20000):
        tree[n] = node           # store the node based on a numeric ID
        node.setNeighbors(nodes) # Make the node refer to more nodes.
        node = nodes.popleft()   # maintain out list of 4 upcoming nodes.
        nodes.append(Node())
        if n % 1000 == 0:
            transaction.commit() # Must commit for data to make it to disk.
            print n
    transaction.commit()
    return tree

At this point the tree variable basically works like a dictionary and can be accessed by the key. You can also get the keys in a range by using tree.keys(min, max) as described by the ZODB BTrees API documentation.

You can store your 10 lists by putting each one under a different key in the root object returned by the ZODB. The root object serves as the "gateway" to the ZODB object storage.

Thanks to the ZODB, you can also use the inter-object references as well as the Btree index. For example:

tree = getTree()

node1 = tree[1]
print node1.p1.p1.p1.p1.p1.p1.p1.p1.p1.value
Joe Koberg
I admit its a very low-level description. I'll fix the question to make it clear.
Oren
I don't have any sample code here. I'll try to write something now.
Oren
It is quite not what I meant. The nodes don't have 4 pointers each. There are 4 "accessing-nodes" outside the list (total 4, not 4 for every node).The new explanation for what I need is better. Is there a way to skip the "commits"?
Oren
* I meant 4 "accessing-pointers".. blegh
Oren
Do you know if the ZODB tree index-nodes (any tree-node except for the buckets at the buttom of the tree), are always stored in the RAM?If they do, then I can use the "deque" trick above to ensure fast sequential access. For example, when I'll want nodes[314] I can add nodes[315:(315+100)/100*100] to the deque. Then, all the accesses to nodes[315], nodes[316], nodes[317]... will be very fast.
Oren
But then when you insert a node between 315 and 316, what will the index be? 315.5? (ok maybe but you'll run into the FP precision limit with enough items). It's easy to have a doubly-linked list of nodes that just reference each other and traverse and insert/remove - but then SOME of the list operations come expensively (like len() unless you keep a count attribute updated)... or random access via index.
Joe Koberg
But yes the ZODB keeps an object cache and frequently-used interior nodes should stay in the cache. I am not aware of the cache eviction policy so can't answer for sure.
Joe Koberg
You must commit .... otherwise how does the system know when to start writing.... Every time you modify an attribute it would generate an IO.
Joe Koberg
Oh. that's another thing about ZODB FileStorage... it's append-only storage... even deletions cause it to grow, until you "pack" the file. But you get atomicity and history features "for free".
Joe Koberg
A: 

How about using Tokyo Cabinet? Very fast and simple, like lists but built for what you want. http://1978th.net/tokyocabinet/

John
Is there a python version?
Oren
There might be: http://stackoverflow.com/questions/601865/python-table-engine-binding-for-tokyo-cabinet
Jon-Eric
I think he will have the same "index-key" problem mentioned with SQL.
Joe Koberg
+2  A: 

Using zc.blist can bring good results after all, and setting the "cache_size" option when creating the DB controls the size of the data that will remain in the RAM. The size of used RAM can grow bigger if you don't do "transaction.commit" often enough. By defining a large cache_size and doing transaction.commit often, the last accessed buckets of the blist will stay in the RAM, giving you fast access to them, and the amount of used RAM won't grow too much.

Packing is very expensive though, but if you have a large harddisk, you don't have to do it that often anyway.

Here is some code to try yourself. Run "top" at the background and change cache_size to see how it affects the amount of used RAM.

import time
import os
import glob
from ZODB import DB
from ZODB.FileStorage import FileStorage
import transaction
from zc.blist import BList

print('STARTING')

random = open('/dev/urandom', 'rb')


def test_list(my_list, loops = 1000, element_size = 100):
    print('testing list')
    start = time.time()
    for loop in xrange(loops):
        my_list.append(random.read(element_size))
    print('appending %s elements took %.4f seconds' % (loops, time.time() - start))

    start = time.time()
    length = len(my_list)
    print('length calculated in %.4f seconds' % (time.time() - start,))

    start = time.time()
    for loop in xrange(loops):
        my_list.insert(length / 2, random.read(element_size))
    print('inserting %s elements took %.4f seconds' % (loops, time.time() - start))

    start = time.time()
    for loop in xrange(loops):
        my_list[loop] = my_list[loop][1:] + my_list[loop][0]
    print('modifying %s elements took %.4f seconds' % (loops, time.time() - start))

    start = time.time()
    for loop in xrange(loops):
        del my_list[0]
    print('removing %s elements took %.4f seconds' % (loops, time.time() - start))

    start = time.time()
    transaction.commit()
    print('committing all above took %.4f seconds' % (time.time() - start,))

    del my_list[:loops]
    transaction.commit()

    start = time.time()
    pack()
    print('packing after removing %s elements took %.4f seconds' % (loops, time.time() - start))

for filename in glob.glob('database.db*'):    
    try:
        os.unlink(filename)
    except OSError:
        pass

db = DB(FileStorage('database.db'),
        cache_size = 2000)

def pack():
    db.pack()

root = db.open().root()

root['my_list'] = BList()

print('inserting initial data to blist')

for loop in xrange(10):
    root['my_list'].extend(random.read(100) for x in xrange(100000))
    transaction.commit()

transaction.commit()

test_list(root['my_list'])
Oren
+1 for posting the working code after finding a solution!
Joe Koberg