views:

239

answers:

3

In mod_wsgi I send the headers by running the function start_response(), but all the page content is passed by yield/return. Is there a way to pass the page content in a similar fashion as start_response()? Using the return.yield statement is very restrictive when it comes to working with chunked data.

E.g.

def Application():

    b = buffer()

    [... page code ...]

    while True:
        out = b.flush()    
        if out:
            yield out

class buffer:

    def __init__(self):        
        b = ['']
        l = 0

    def add(self, s):
        s = str(s)
        l += len(s)
        b.append(s)

    def flush(self):

        if self.l > 1000:
            out = ''.join(b)
            self.__init__()
            return out

I want to have the buffer outputting the content as the page loads, but only outputs the content once enough of it has piled up (in this eg. 1000 bytes).

+2  A: 

No; But I don't think it is restrictive. Maybe you want to paste an example code where you describe your restriction and we can help.

To work with chunk data you just yield the chunks:

def application(environ, start_response):
    start_response('200 OK', [('Content-type', 'text/plain')]
    yield 'Chunk 1\n'    
    yield 'Chunk 2\n'    
    yield 'Chunk 3\n'
    for chunk in chunk_data_generator():
        yield chunk

def chunk_data_generator()
    yield 'Chunk 4\n'
    yield 'Chunk 5\n'


EDIT: Based in the comments you gave, an example of piling data up to a certain length before sending forward:

BUFFER_SIZE = 10 # 10 bytes for testing. Use something bigger
def application(environ, start_response):
    start_response('200 OK', [('Content-type', 'text/plain')]
    buffer = []
    size = 0
    for chunk in chunk_generator():
        buffer.append(chunk)
        size += len(chunk)
        if size > BUFFER_SIZE:
            for buf in buffer:
                yield buf
            buffer = []
            size = 0

def chunk_data_generator()
    yield 'Chunk 1\n'    
    yield 'Chunk 2\n'    
    yield 'Chunk 3\n'
    yield 'Chunk 4\n'
    yield 'Chunk 5\n'
nosklo
Well, it's restrictive because all the yielding takes place in the application() and can't be embedded anywhere else. start_response() can be placed anywhere in the application. If i wanted to run an output buffer that piles up data to a certain length before outputting to wsgi, this becomes very hard without getting threads involved.
Ian
I just gave you an example where you can put yielding somewhere else. No threads involved. You are still not clear on what exactly is the limitation. Place some example pseudo code on how you'd like to work in your question so we can either make it work or say why it can't.
nosklo
I added an example, but the problem with my example, is the buffer only starts yielding results *after* the page code has completed execution, which in the end, is the same as if i had simply done a regular (non-chunked) response.
Ian
After seeing your example, this helps a bit.. I think one of my current problems is actually that I'm using sys.stdout to populate the buffer, whereas I should simply be iterating/yielding to the buffer instead..
Ian
So I'm guessing the only way to do it, is like your example where I yield strings (in the page code) and the buffer collects it through iteration..? or is there a way to use the print() command?
Ian
Portable WSGI applications should not use print or sys.stdout
Miles
Why not? You can always overwrite the sys.stdout.. what would be wrong with that?
Ian
@Ian: The standard prevents that. See the PEP. Why do you want to use print? What's wrong with yield?
nosklo
@nosklo: I prefer print() because it can be called from anywhere. With yield I need to constantly do iterations, and as I make things more abstract and use functions that call other functions, I lose the ability to yield.
Ian
+1  A: 

It is possible for your application to "push" data to the WSGI server:

Some existing application framework APIs support unbuffered output in a different manner than WSGI. Specifically, they provide a "write" function or method of some kind to write an unbuffered block of data, or else they provide a buffered "write" function and a "flush" mechanism to flush the buffer.

Unfortunately, such APIs cannot be implemented in terms of WSGI's "iterable" application return value, unless threads or other special mechanisms are used.

Therefore, to allow these frameworks to continue using an imperative API, WSGI includes a special write() callable, returned by the start_response callable.

New WSGI applications and frameworks should not use the write() callable if it is possible to avoid doing so.

http://www.python.org/dev/peps/pep-0333/#the-write-callable

But it isn't recommended.

Generally speaking, applications will achieve the best throughput by buffering their (modestly-sized) output and sending it all at once. This is a common approach in existing frameworks such as Zope: the output is buffered in a StringIO or similar object, then transmitted all at once, along with the response headers.

The corresponding approach in WSGI is for the application to simply return a single-element iterable (such as a list) containing the response body as a single string. This is the recommended approach for the vast majority of application functions, that render HTML pages whose text easily fits in memory.

http://www.python.org/dev/peps/pep-0333/#buffering-and-streaming

Miles
+1  A: 

If you don't want to change your WSGI application itself to partially buffer response data before sending it, then implement a WSGI middleware that wraps your WSGI application and which performs that task.

Graham Dumpleton