Here is how I would do this (all code untested):
Step 1: You need to create the algorithms. The Dataset could look like this:
class Dataset(object):
def __init__(self, dataset):
self.dataset = dataset
def __iter__(self):
for x in self.dataset:
yield x
Notice that you make an iterator out of it, so you iterate over it one item at a time. There's a reason for that, you'll see later:
Another algorithm could look like this:
class Multiplier(object):
def __init__(self, previous, multiplier):
self.previous = previous
self.multiplier = multiplier
def __iter__(self):
for x in previous:
yield x * self.multiplier
Step 2
Your user would then need to make a chain of this somehow. Now if he had access to Python directly, you can just do this:
dataset = Dataset(range(100))
multiplier = Multiplier(dataset, 5)
and then get the results by:
for x in multiplier:
print x
And it would ask the multiplier for one piece of data at a time, and the multiplier would in turn as the dataset. If you have a chain, then this means that one piece of data is handled at a time. This means you can handle huge amounts of data without using a lot of memory.
Step 3
Probably you want to specify the steps in some other way. For example a text file or a string (sounds like this may be web-based?). Then you need a registry over the algorithms. The easiest way is to just create a module called "registry.py" like this:
algorithms = {}
Easy, eh? You would register a new algorithm like so:
from registry import algorithms
algorithms['dataset'] = Dataset
algorithms['multiplier'] = Multiplier
You'd also need a method that creates the chain from specifications in a text file or something. I'll leave that up to you. ;)
(I would probably use the Zope Component Architecture and make algorithms components and register them in the component registry. But that is all strictly speaking overkill).