tags:

views:

109

answers:

6

Hi

I will have alot of similar objects with similar parameters. Example of an object parameters would be something like :

name, boolean, number and list.

The name must be unique value among all the objects while values for boolean, number and list parameters must not.

I could store the data as list of dictionaries i guess. Like that:

list = [
  {'name':'a', 'bool':true, 'number':123, 'list':[1, 2, 3]},
  {'name':'b', 'bool':false, 'number':143, 'list':[1, 3, 5]},
  {'name':'c', 'bool':false, 'number':123, 'list':[1, 4, 5, 18]},
]

What would be the fastest way to check if the unique name exists in the list of dictionaries, before i create another dictionary in that list? Do i have to loop through the list and check what is the value of list[i][name]? What would be fastest and least memory conserving to hold and process that information, assuming, that different similar lists might be simultanously processed in different threads/tasks and that their size could be anywhere between 100 to 100 000 dictionaries per list. Should i store those lists in database instead of memory?

I understand that perhaps i should not be thinking about optimizing (storing the info and threads) before the project is working, so please, answer the unique name lookup question first :)

Thanks, Alan

+6  A: 

If the name is the actual (unique) identifier of each inner data, you could just use a dictionary for the outer data as well:

data = {
  'a' : { 'bool':true, 'number':123, 'list':[1, 2, 3] },
  'b' : { 'bool':false, 'number':143, 'list':[1, 3, 5] },
  'c' : { 'bool':false, 'number':123, 'list':[1, 4, 5, 18] },
}

Then you could easily check if the key exists or not.

Btw. don't name your variables list or dict as that will overwrite the built-in objects.

poke
+1 to using the dictionaries, although if your internal data (`{ 'bool':true, 'number':123, 'list':[1, 2, 3] }`) is always the same in the same order, I would use a list to make it less storage-intensive. so `data = {'a' : [true, 123, [1,2,3]], 'b' : [false, 143, [1,3,5]] }`. Either way, I agree that the easiest way to check would just be `if potential_new_key in data: #do stuff`
nearlymonolith
Awesome! thanks! Didnt read python docs as closely as i could i guess :)
Zayatzz
Oh and of course i will not use variables like list or dict or boolean. I used them here just to give you some idea what kind of data i plan to store.
Zayatzz
A: 

you could use a dictionary of dictionaries using the name parameter as the dictionary key that holds the object as a value

joaquin
A: 

Store the objects in a dictionary with the name as the key:

objects = {'a' : {'bool':true, 'number':123, 'list':[1, 2, 3]},
           'b' : {'bool':false, 'number':143, 'list':[1, 3, 5]},
           'c' : {'bool':false, 'number':123, 'list':[1, 4, 5, 18]}}

This way you ensure that the names are unique since all the keys in the dictionary are unique. Checking is a name is in the dictionary is also easy:

name in objects
Arlaharen
Why not just call `name in objects` rather than `bool(name in objects)`?
Michael Williamson
I added bool() just to make it clearer, but I guess that the opposite was the case. :-) Editing...
Arlaharen
+1  A: 

once you come around to using a dict instead of a list, the fastest way to perform the check that you want is:

if 'newkey' not in items:
    # create a new record

since you want to be able to access these records from multiple threads, I would keep a collection of locks. BTW, this is the sort of thing that you design in the beginning as it's part of the application design and not an optimization.

class DictLock(dict):
    def __init__(self):
        self._lock = threading.Lock()

    def __getitem__(self, key):
        # lock to prevent two threads trying to create the same
        # entry at the same time. Then they would get different locks and
        # both think that they could access the key guarded by that lock
        with self._lock:
            if key not in self.iterkeys():
                self[key] = threading.Lock()
            return super(DictLock, self).__getitem__(key)

now if you want to modify your items, you can use the locks to keep it safe.

locks = DictLock()

with locks['a']:
    # modify a.

or to insert a new element

with locks['z']:
    #we are now the only ones (playing by the rules) accessing the 'z' key
    items['z'] = create_new_item()
aaronasterling
Thanks. The newkey in dict is a way to go for me. And thanks about the lock thing. I wont be using same dict in multiple threads though. I meant to use different dicts in different threads. One thread will still work on only the same dictionary.I will still need a way to keep track of the amount of different threads, depending on how big dictionaries are beeing processed, but thats another problem entirely. Thanks again!
Zayatzz
A: 

What you want is an "intrusive" dictionary - something that looks for keys inside values. Unfortunately, I don't know of any implementation in Python. Boost's multi_index comes close.

Arkadiy
A: 

If you don't want to change the data structure you have, then you can use the following. Otherwise, poke's answer is the way to go.

>>> my_list = [
...   {'name':'a', 'bool':True, 'number':123, 'list':[1, 2, 3]},
...   {'name':'b', 'bool':False, 'number':143, 'list':[1, 3, 5]},
...   {'name':'c', 'bool':False, 'number':123, 'list':[1, 4, 5, 18]},
... ]
>>> def is_present(data, name):
...     return any(name == d["name"] for d in data)
... 
>>> is_present(my_list, "a")
True
>>> is_present(my_list, "b")
True
>>> is_present(my_list, "c")
True
>>> is_present(my_list, "d")
False

If you pass any an iterable, it returns True if any one of its elements are True.

(name == d["name"] for d in data) creates a generator. Each time somebody (in this case, any) requests the next element, it does so by getting the next element, d, from data and transforms it by the expression name == d["name"]. Since generators are lazy i.e. the transformation is done when the next element is requested, this should use relatively little memory (and should use the same amount of memory regardless of the size of the list).

Michael Williamson