tags:

views:

53

answers:

1

I am creating a System.IO.Stream implementation for the purpose of setting a filter for HttpContext.Response.Filter. I wish to know if all calls of Write(byte[], int, int) will guarantee that the bytes written contain whole character byte sequences or if it is possible that a single character (in the case of a utf-32 encoding) could be split amongst calls.

public override void Write(byte[] buffer, int offset, int count) {
    // Here `e' is a reference to `ctx.Response.ContentEncoding'
    // from the original context.
    char[] chars = e.GetChars(buffer, offset, count);
    //... Stream processing logic here.
}

My current testing using utf-32 has proven that calls seem to always contain only whole character byte sequences but I wanted confirmation before I validated my assumption.

If it is likely that the bytes being written could be split amongst calls to Write, what is the best approach to tackle this? I was thinking of performing a single byte width check in my constructor like so and using that to see if the byte array is divisible by that value. This however is naturally undesired though fairly trivial to implement.

// Here `e' is a reference to `ctx.Response.ContentEncoding'
// from the original context.
// `charLen' will yield 4 for a utf-32 encoding.
charLen = e.GetByteCount(new char[] { ' ' });
+2  A: 

Streams don't know whether they're dealing with character data or binary data. It's up to the filter or potentially the StreamWriter which is likely to wrap your stream to decide whether it will write out a whole character at a time or not.

I would personally expect StreamWriter to only ever write out full characters, but I don't think I'd rely on it. I can't see anything guaranteeing that behaviour.

I suggest you use a System.Text.Decoder (obtained by calling Encoding.GetDecoder) and use that to maintain appropriate state. Indeed, that's exactly what it's designed for :) See the linked docs for more details.

Jon Skeet
Hi Jon, thanks for the reply. I thought Encoding.GetChars was a reference to the encodings GetDecoder.GetChars? What I'm concerned with though is if there is one byte not in the current sequence but is in the first part of the next call to `Write()` then how would I know? no Decoder could tell either. If the stream that's wrapping my stream currently only the default stream in the chain) is writing binary then that's another story altogether as by stream only handles character processing.
Brett Ryan
Encoding.GetChars will create a *new* decoder - you want to maintain one between calls, so that if one write finishes half way through, it will maintain the "spare" byte for use in the next call.
Jon Skeet
(Just to clarify - Encoding.GetChars will *logically* create a new decoder... the implementation may not actually do this of course. The point is that the Encoding isn't stateful, whereas Decoder is.)
Jon Skeet
Fantastic Jon, thank you veery much for clarifying that for me, I was just starting to realise this through MSDN which enforced what you had said. It wouldn't be too much of a change for me to obtain a decoder/encoder reference, this is my first filter implementation and wanted to ensure I didn't end up with any nasty surprises later :) Your help is greatly appreciated Jon.
Brett Ryan