views:

95

answers:

2

Hi All:

We've got the following code sample:

big_static_data = {
  "key1" : {
     "subkey1" : "subvalue1",
     ...
     },
  "key2" : 
   ...
}
class StaticDataEarlyLoad:
    def __init__(self):
        self.static_data = big_static_data
        # other init
    def handle_use_id(self, id):
        return complex_handle(self.static_data, id)
    ...
class StaticDataLazyLoad:
    def __init__(self):
        # not init static data
        # other init
    def handle_use_id(self, id):
        return complex_handle(big_static_data, id)
    ...

Just as the above codes say, whenever we call the instance's *handle_use_id*, we may get different performance issues.

IMO, early load will load the data when the instance is created, and will be in memory till the instance is garbaged. And for late load, the static data won't be loaded till we call the *handle_use_id* method. Am I right? (Since I'm not so clear with Python's internal, I'm not sure how long the instance will last till garbaged). And If I'm right, the early load means a big memory requirement and the late load means we have to load the data each time when invoking the method( a big overhead?)

Now, we are a web based project, So which should be selected as the best approach? (*handle_use_id* will be invoked very frequently.)

Thanks.

+3  A: 

In your example, StaticDataLazyLoad (once the syntax for init is correct) wont make a big difference.

"big_static_data" is initialized ("loaded") when the module is imported. It will immediately require some memory, no matter whether an instance of your classes is created or not.

An instance of StaticDataEarlyLoad will just create a new reference to big_static_data, not a new copy.

Thus, a lookup in StaticDataEarlyLoad may be slightly faster, since the data is referenced via self in the local scope (lookup "self", then lookup "self.static_data").

A lookup in StaticDataLazyLoad will not find "big_static_data" in the local scope, python will then look it up in the global scope and find it. Since the global scope is probably larger, this lookup may take longer than the lookup of "self.static_data".

resi
thanks. But I'm wondering is there any better approach to improve the performance? If i put the big static data to a class, then use it in the approaches in the post(early load and lazy load), so the early load is good for speed but use more memory? and the late load is the opposite?
Tower Joo
If big_static_data is loaded during some class method (StaticDataEarlyLoad::__init__ or StaticDataLazyLoad::handle_use_id) no memory will be used before that method is invoked.This version of StaticDataLazyLoad will have performance drawbacks if the data is loaded on every method invocation. You might want to consider a combination of both classes, where the data is loaded only once, most likely during the first invocation of handle_use_id.
resi
@Tower Joo: Correct: The early load is good for speed but use more memory. Late load is bad for speed by use less memory. Correct. What more do you want to know? What's your question? Do you want some magical "solution" that breaks this fundamental law of software?
S.Lott
+3  A: 

big_static_data is created once at the beginning of the file (at least in the code that you show).

This consumes memory.

When you create an instance of StaticDataEarlyLoad,

StaticDataEarlyLoad().static_data is a reference to big_static_data. It consumes a very minor amount of memory. It merely points at the same dictionary that big_static_data points to. No copy of big_static_data is made, there is no real "loading" going on.

When instance StaticDataEarlyLoad() gets garbage-collected, a little memory is freed, but the big_static_data remains.

StaticDataLazyLoad does much the same thing, but doesn't create an attribute static_data. It just references big_static_data directly. The difference in memory consumption between StaticDataEarlyLoad and StaticDataLazyLoad is very minor. And there will be essentially no difference in speed.

it is always best to make explicit what a class depends upon. StaticDataEarlyLoad depends on big_static_data. Therefore, you should define

class StaticDataEarlyLoad:
    def __init__(self,static_data):
        self.static_data = static_data

And initialize instances with StaticDataEarlyLoad(big_static_data).

There is essentially no difference in speed between this definition and the one you posted. Putting dependencies into the call signature of __init__ is just a good idea for the sake of organization, and after all you are using Python's OOP for good control of organization, right?

unutbu
thanks. You mentioned to change the init signature, which I don't agree, at least in this use case. Since this file(or class) is just for fetching data, and users of this class don't **need** to know so much about what the data are. They just care the methods which they can use to get the proper data(maybe a little like a DB).
Tower Joo