Why we use flush parameter with Encoder.GetBytes method

views:

answers:

+5 Q:

Why we use flush parameter with Encoder.GetBytes method

This link explains the Encoder.GetBytes Method and there is a bool parameter called flush explained too . The explanation of flush is :

true if this encoder can flush its state at the end of the conversion; otherwise, false. To ensure correct termination of a sequence of blocks of encoded bytes, the last call to GetBytes can specify a value of true for flush.

but I didn't understand what flush does , maybe I am drunk or somthing :). can you explain it in more details please.

+2 A:

Internally the Encoder would be implemented with a buffer - this buffer may need to be flushed (cleared) in order to end the read correctly or prepare the Encoder for the next read.

Here is one explanation of buffer flushing.

The exact usage of the flush parameter is described here:

true to clear the internal state of the encoder after the conversion; otherwise, false.

Oded 2010-10-04 18:30:15

+6 A:

Suppose you receive data over a socket connection. You will receive a long text as several byte[] blocks.

It is possible that 1 Unicode character occupies 2+ bytes in a UTF-8 stream and that it is split over 2 byte blocks. Encoding the 2 byte blocks separately (and concatenating the strings) would produce an error.

So you can only specify flush=true on the last block. And of course, if you only have 1 block then that is also the last.

Tip: Use a TextReader and let it handle this problem(s) for you.

Edit

The mirror problem (that was actually asked: GetBytes) is slightly harder to explain.

Using flush=true is the same as using Encoder.Reset() after GetBytes(...). It clears the 'state' of the encoder,

including trailing characters at the end of the previous data block, such as an unmatched high surrogate

The basic idea is the same: when converting from string to blocks of bytes, or vice versa, the blocks are not independent.

Henk Holterman 2010-10-04 18:35:40

`GetBytes()` is for going from chars to bytes, not vice versa. :)

bzlm 2010-10-04 18:38:09

@bzlm, Ok, mirror problem.

Henk Holterman 2010-10-04 18:39:03

@Henk I think this mirroring is exactly what is confusing the OP. Perhaps you would care to answer the actual question. :)

bzlm 2010-10-04 18:40:32

+2 A:

Flushing will reset the internal state of the encoder instance used to encode the text into bytes. Why does it need internal state, you ask? Well, to quote MSDN:

The flush parameter is useful for flushing a high-surrogate at the end of a stream that does not have a low-surrogate. For example, the Encoder created by UTF8Encoding.GetEncoder uses this parameter to determine whether to write out a dangling high-surrogate at the end of a character block.

If you're using multiple GetBytes(), hence, you would want to flush the internal state at the end to terminate any character sequences that need terminating, but only at the end, since terminating sequences might otherwise be introduced in the middle of words.

Note that this may be a purely theoretical problem these days. And, you'd be better off using higher-level wrappers anyway. If you do, being drunk will not be a problem.

bzlm 2010-10-04 18:36:55

ansaurus

tags:

views:

answers:

Why we use flush parameter with Encoder.GetBytes method

Edit

related questions