views:

187

answers:

4

I am writing a Python application in the field of scientific computing. Currently, when the user works with the GUI and starts a new physics simulation, the interpreter immediately imports several necessary modules for this simulation, such as Traits and Mayavi. These modules are heavy and take too long to import, and the user has to wait ~10 seconds before he can continue, which is bad.

I thought of something that might remedy this. I'll describe it and perhaps someone else has already implemented it, if so please give me a link. If not I might do it myself.

What I want is a separate thread that will import modules asynchronously. It will probably be a subclass of threading.Thread.

Here's a usage example:

importer_thread = ImporterThread()
importer_thread.start()

# ...

importer_thread.import('Mayavi')
importer_thread.import('Traits')
# A thread-safe method that will put the module name
# into a queue which the thread in an inifine loop

# ...

# When the user actually needs the modules:
import Mayavi, Traits
# If they were already loaded by importer_thread, we're good.
# If not, we'll just have to wait as usual.

So do you know of anything like this? If not, do you have any suggestions about the design?

+2  A: 

The problem with this is that the imports must still complete before they are usable. Depending on when they're first used, the application could still have to block for 10 seconds before it could start up anyway. Much more productive would be to profile the modules and figure out why they take so long to import.

Ignacio Vazquez-Abrams
1. The assumption is that at least a few seconds pass between the time the user starts the program until he starts a simulation.2. To profile a third-party module like Mayavi will be a mega-project. There are many people who know its internals much better than myself, and I believe they have already tried to make it load fast, and I doubt it's a good idea that I'll try to outdo them. Also, I may use dozens of different third party modules.
cool-RR
Posit 1 will not hold up in e.g. a batch processor. Posit 2 can be aided by replacing `__builtins__.__import__()` with a function that logs import start and end times in order to give a better idea of what needs to be looked at.
Ignacio Vazquez-Abrams
A: 

"the user works with the GUI and starts a new physics simulation"

Not really clear. Does "works with the GUI" means double click? Double click what? Some wxWidgets GUI application? Or IDLE?

If so, what does "starts a new physics simulation" mean? Click a button somewhere else? A GUI button to bring up a panel where they write code? Or do they import a script they wrote off line?

Why is the import happening before the simulation starts? How long does a simulation take? What does the GUI show?

I suspect that there's a way to be much, much lazier in doing the big imports. But from the description, it's hard to determine if there's a point in time where the import doesn't matter as much to the user.

Threads don't help much. What helps is rethinking the UI experience.

S.Lott
You know how you start Photoshop, and then you can start a new image? Same here. You start the program, and then through some menu items or buttons, you start a new simulation. (It's wxPython-based.) It doesn't bring up a panel to write code. The simulation is already ready to start. (It's a ready-made simpack.) So the third party modules must be fully loaded when creating the simulation.
cool-RR
-1. Sometimes you need to read between the lines and not be so hard-headed. It wasn't that hard to figure out what he meant. Also this is more of a comment than an answer.
FogleBird
@FogleBird. I find that reading between the lines and making other assumptions are the root cause of many problems. I prefer to ask and get the details because the mistakes are based on faulty assumptions and reading between the lines at requirements or design time. I find that hard-headed stupidity is the way to get the assumptions out into the open where they can be fixed.
S.Lott
When "creating" or "running"? Creating doesn't involve writing code? But somehow requires the modules to be loaded? I'm still perfectly unclear on what's going on.
S.Lott
He has some heavy 3rd party modules that aren't used until the user performs some action in the app. Obviously there will be some duration of time between the app being launched and the user performing said action. If you can import those modules in the background early, without blocking the UI, that would be good. Why is that so hard to understand?
FogleBird
@FogleBird: Repeating the broad, vague overview isn't helpful. Further, the question says "works with GUI" and "starts simulation". Words I don't understand. It does not say "launch" anywhere, a word you added. "user performing 'said' action" may not be a single click. Or perhaps it is. I can't tell. I still don't understand the *actual* sequence of actions into which a slow process can be inserted in a way that's less disturbing to the users. You can claim it's simple -- and perhaps it is -- but I doubt it's simple. The point is to find something the the author overlooked.
S.Lott
+1  A: 

Why not just do this when the app starts?

def background_imports():
    import Traits
    import Mayavi

thread = threading.Thread(target=background_imports)
thread.setDaemon(True)
thread.start()
FogleBird
Yeah, that will work . I just want something a bit more flexible. I think I'll take your advice and try to implement it myself. (The reason I asked is to avoid reinventing the wheel, if possible.)
cool-RR
More flexible in what way?
FogleBird
To be honest I'm not sure. But for example, I might want the program to scan the library of simpacks, and pre-import the modules necessary to them. But that's just an example, I'll implement it and then see how it goes.
cool-RR
+1  A: 

The general idea is good, but the Python/GUI session might not be all that responsive while the background thread is importing away; unfortunately, import inherently and inevitably "locks up" Python substantially (it's not just the GIL, there's specific extra locking for imports).

Still worth trying, as it might make things a bit better -- it's also very easy, since Queues are intrinsically thread-safe and, besides a Queue's put and get, all you need is basically an __import__. Still, don't be surprised if this doesn't help enough and you still need extra oomph.

If you have some drive that's intrinsically very fast, but with limited space, such as a "RAM drive" or a particularly snippy solid-state one, it may be worth keeping the needed packages in a .tar.bz2 (or other form of archive) and unpacking it onto the fast drive at program start (that's essentially just I/O and so it won't lock things up badly -- I/O operations rapidly release the GIL -- and also it's especially easy to delegate to a subprocess running tar xjf or the like).

If some of the import slowness is due to a huge number of .py/.pyc/.pyo files, it's worth a try to keep those (in .pyc form only, not as .py) in a zipfile and importing from there (but that only helps with the I/O overhead, depending on your OS, filesystem, and drive: doesn't help with delays due to loading huge DLLs or executing initialization code in packages at load time, which I suspect are likelier culprits for the slowness).

You could also consider splitting the application up with multiprocessing -- again using Queues (but of the multiprocessing kind) to communicate -- so that both imports and some heavy computations are delegated to a few auxiliary processes and thus made asynchronous (this may also help fully exploiting multiple cores at once). I suspect this may unfortunately be hard to arrange properly for visualization tasks (such as those you're presumably doing with mayavi) but it might help if you also have some "pure heavy computation" packages and tasks.

Alex Martelli
You were right. I implemented it and it barely helped. Even though the importing was on a separate thread, the main thread was very unresponsive and the GUI was frozen.I think the other solutions you suggested that involved moving files around would be too complex. I'm already using `multiprocessing`, I don't think it could help with this.Is there no other way? Maybe being able to pause the import process in a few points so the wxPython loop could get some control and keep the UI sort-of responsive?Do you think that this is an inherent weakness of Python?
cool-RR
If you do the `import`s in processes different than the one handling the GUI, then the GUI's responsiveness cannot be impacted -- so I'm totally nonplussed by your claim that you "don't think it could help with" keeping the GUI responsive. How could it **fail** to help?!
Alex Martelli
I agree that doing the importing in a separate process will free up the GUI process completely, but I need to have the module available in the GUI process. Currently Mayavi is the heaviest, and that is definitely needed in the GUI process. Are you suggesting there's some way to import the module in a separate process and somehow make it available on the GUI process?
cool-RR