views:

186

answers:

2

I have a a simple ordered list that could contain 1 million or more items. There are only a few actions that are done with this list:

  • lookup in a value exist
  • find the index for a value
  • find value for index
  • add a value
  • get number of items in the list

Once a value is added to the list, it never changes. I append items to the list, no insert or delete.

I need to manipulate this big list, and store it persistently. Right now I am using a database Int => String to represent the list, but I think there should me a more efficient way to do that.

I could use memcached, but I think 2 functions are missing:

  • persistent storage
  • find the index for a value
+6  A: 

It appears that you also need a String -> Int mapping table.

In Perl the easiest way to do this is to tie a hash to a DBM File (see man perltie).

Sample code, untested, could almost certainly be improved:

use DB_File;
tie %value2index, 'DB_File', 'value2index';
tie %index2value, 'DB_File', 'index2value';

sub index_count() {
    return scalar %value2index;
}

sub value_exists() {
    my $value = shift;
    return exists($value2index{$value});
}

sub append() {
    my $value = shift;
    if (!value_exits($value)) { # prevent duplicate insertions
        my $index = index_count() + 1;
        $value2index{$value} = $index;
        $index2value{$index} = $value;
    }
}

sub find_index() {
    my $value = shift;
    return $value2index{$value};
}

sub find_value() {
    my $index = shift;
    return $index2value{$index};
}

Don't use this in a multi-threaded environment, there are non-atomic operations here.

Alnitak
I like that better than my solution. It gives you O(1) for everything. It does assume memory overhead is not a problem -- you wind up having to store the data twice, plus the indexes, whereas I store only the data, but at the cost of one of the searches being O(log n) .
SquareCog
there shouldn't be significant memory overhead - the DB files are stored on disk
Alnitak
You are right, I misinterpreted the question.. deleted the irrelevant answer.
SquareCog
Looks good. I'll need it to be thread-safe, I will check on CPAN if a module does that already.
Julien
A: 

How big are your Items? How much memory do you mind using? Are you items unique?

You could probably get away with something like this:

my @list; # This keeps the ordered list
my %keyval; # This maps key to value
my %valkey; # This maps value to key

on each insert you would:

push @list, value;
$valkey{$value} = $#list;
$keyval{$#list} = $value;

And for each of your requirements:

#Existence of a value:
if(exists($valkey{$value}));

#Existence of an index;
if(exists($keyval{$index}));

#value for an index:
$keyval{$index};

#value for a key:
$valkey{$value};

#Size
$#list;
Frosty