views:

502

answers:

3

i quote from the tokyo cabinet docs...

As for database of hash table, each key must be unique within a database, so it is impossible to store two or more records with a key overlaps.

or does tokyocabinet allow tuple based keys ?

what would be the best way to set a one-to-many store ( like a crawler 1 kw<->many docids )

~B

+1  A: 

Using the table database (TDB), you can simply store a list of keys in one value as tokens. As long as your keys are valid "tokens", you can easily list them this way in a single field.

Here's an example using Pyrant's low-level interface:

>>> from pyrant import Tyrant
>>> t = Tyrant()
>>> includes = 5  # code for operation TDBQCSTROR
>>> t['test'] = {'foo': 'abc,def', 'bar': 'abc def', 'quux': 'abcdef'}
>>> t.proto.search([('foo',includes,'abc')])
[u'test']
>>> t.proto.search([('bar',includes,'abc')])
[u'test']
>>> t.proto.search([('quux',includes,'abc')])
[]
>>> t.proto.search([('quux',includes,'abcd')])
[]
>>> t.proto.search([('quux',includes,'abcdef')])
[u'test']

TDBQCSTROR is an operation type which stands for "string includes at least one token in..." (see "tctdbqryaddcond" in Tokyo Cabinet API specs).

Note that both "abc,def" and "abc def" matched the "abc" keyword, but "abcdef" didn't, despite "abc" is actually subset of "abcdef". This can be used to search keys stored in a single string, e.g.:

t['tokyocabinet'] = {'title': 'Tokyo Cabinet'}
t['primary-key'] = {'title': 'Primary Key'}
t['question1228313'] = {
    'title': 'how to build one to many rows in tokyo cabinet?',
    'tags': 'tokyocabinet, primary-key',
}

(Tags are probably not the best example as they don't need to be references.)

If you are using a TC database of another kind (not TDB), I cannot imagine a valid solution. You may want to ask this question in the related discussion group.

Andy Mikhaylenko
A: 

I'm using Ruby. And based on the Andy's answer, I tried with rufus/tokyo, which is a Ruby Gem (package for Ruby) to access Tokyo Cabinet. It turned out that it was :stror instead of :include in the case of rufus/tokyo (Rufus::Tokyo::TableQuery).

I thought that posting it may be useful to Ruby users. In the code, I'm trying to create one-to-many relationships to something called sector based on its id.

Here is what I did:

$ irb
>> require 'rubygems'
=> false
>> require 'rufus/tokyo'
=> true
>> table = Rufus::Tokyo::Table.new("temp3.tct", :mode => "cwf")
=> #<Rufus::Tokyo::Table:0x1006304b0 @db=#<Native Pointer address=0x101c01b40>, @path="temp3.tct">
>> id = table.generate_unique_id
=> 1
>> table[1] = { "name" => "Temp 1", "sector_ids" => "23, 3, 1, 5236, 36" }
=> {"sector_ids"=>"23, 3, 1, 5236, 36", "name"=>"Temp 1"}
>> table_result_set = table.query { | query | query.add 'sector_ids', :includes, "3" }
=> [{"name"=>"Temp 1", "sector_ids"=>"23, 3, 1, 5236, 36", :pk=>"1"}]
>> table[2] = { "name" => "Temp 2", "sector_ids" => "523, 63, 23" }
=> {"sector_ids"=>"523, 63, 23", "name"=>"Temp 2"}
>> table_result_set = table.query { | query | query.add 'sector_ids', :includes, "3" }
=> [{"name"=>"Temp 1", "sector_ids"=>"23, 3, 1, 5236, 36", :pk=>"1"}, {"name"=>"Temp 2", "sector_ids"=>"523, 63, 23", :pk=>"2"}]
>> table_result_set = table.query { | query | query.add 'sector_ids', :stror, "3" }
=> [{"name"=>"Temp 1", "sector_ids"=>"23, 3, 1, 5236, 36", :pk=>"1"}]
>> table_result_set = table.query { | query | query.add 'sector_ids', :stror, "63" }
=> [{"name"=>"Temp 2", "sector_ids"=>"523, 63, 23", :pk=>"2"}]
>> table_result_set = table.query { | query | query.add 'sector_ids', :stror, "2" }
=> []
tadatoshi
A: 

As for database of hash table, each key must be unique within a database, so it is impossible to store two or more records with a key overlaps.

B+ -Tree Tokyo Cabinet databases allow duplicate keys:

bool tcbdbputdup(TCBDB *bdb, const void *kbuf, int ksiz, const void *vbuf, int vsiz); 

Using the Ruby API:

TokyoCabinet::BDB.putdup(key, value) -> true|false
TokyoCabinet::BDB.getlist(key) => [value, ...]|nil
fubra