I can't speak for the other formats but utf8 shouldn't be too hard.
Just look at the first byte of the chunk you grabbed and figure out from there:
Taken from wikipedia:
00000000-01111111 00-7F 0-127 US-ASCII (single byte)
10000000-10111111 80-BF 128-191 2'nd, 3rd, or 4'th byte of a multi-byte sequence
11000000-11000001 C0-C1 192-193 start of a 2-byte sequence, but code point <= 127
11000010-11011111 C2-DF 194-223 Start of 2-byte sequence
11100000-11101111 E0-EF 224-239 Start of 3-byte sequence
11110000-11110100 F0-F4 240-244 Start of 4-byte sequence
If the byte is in the 2'nd or 3'rd group then you know you missed part of a character. If it's in the 1'st,4'th,5'th,6'th group then you know you are on the start of a character. Proceed accordingly from there.