tags:

views:

1211

answers:

5

Hey everyone,

I have a set of 6 bits that represent a 7bit ASCII character. How can I get the correct 7bit ASCII code out of the 6 bits I have? Just append a zero and do an bitwise OR?

Thanks for your help.

Lennart

+7  A: 

ASCII is inherently a 7-bit character set, so what you have is not "6-bit ASCII". What characters make up your character set? The simplest decoding approach is probably something like:

char From6Bit( char c6 ) {
    // array of all 64 characters that appear in your 6-bit set
    static SixBitSet[] = { 'A', 'B', ... }; 
    return SixBitSet[ c6 ];
}

A footnote: 6-bit character sets were quite popular on old DEC hardware, some of which, like the DEC-10, had a 36-bit architecture where 6-bit characters made some sense.

anon
Heh, maybe it's the opposite of equally non-standard "Extended ASCII", entitled "Contracted ASCII".
dreamlax
+2  A: 

You must tell us how your 6-bit set of characters looks, I don't think there is a standard.

The easiest way to do the reverse mapping would probably be to just use a lookup table, like so:

static const char sixToSeven[] = { ' ', 'A', 'B', ... };

This assumes that space is encoded as (binary) 000000, capital A as 000001, and so on.

You index into sixToSeven with one of your six-bit characters, and get the local 7-bit character back.

unwind
There are (were) actually multiple 6-bit character standards. DEC has been mentioned. There was even originally a 5-bit character encoding standard: http://tamilelibrary.org/teli/history1.html
Robert Fraser
A: 

If I were to give you the value of a single bit, and I claimed it was taken from Windows XP, could you reconstruct the entire OS?

You can't. You've lost information. There is no way to reconstruct that, unless you have some knowledge about what was lost. If you know that, say, the most significant bit was chopped off, then you can set that to zero, and you've reconstructed at least half the characters correctly.

If you know how 'a' and 'z' are represented in your 6-bit encoding, you might be able to guess at what was removed by comparing them to their 7-bit representations.

jalf
+1  A: 

The only recent 6-bit code I'm aware of is base64. This uses four 6-bit printable characters to store three 8-bit values (6x4 = 8x3 = 24 bits).

The 6-bit values are drawn from the characters:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

which are the values 0 thru 63. Four of these (say UGF4) are used to represent three 8-bit values.

UGF4 = 010100 000110 000101 111000
     = 01010000 01100001 01111000
     = Pax

If this is how your data is encoded, there are plenty of snippets around that will tell you how to decode it (and many languages have the encoder and decoder built in, or in an included library). Wikipedia has a good article for it here.

If it's not base64, then you'll need to find out the encoding scheme. Some older schemes used other lookup methods of the shift-in/shift-out (SI/SO) codes for choosing a page within character sets but I think that was more for choosing extended (e.g., Japanese DBCS) characters rather than normal ACSII characters.

paxdiablo
+1  A: 

I can't imagine why you'd be getting old DEC-10/20 SIXBIT, but if that's what it is, then just add 32 (decimal). SIXBIT took the ASCII characters starting with space (32), so just add 32 to the SIXBIT character to get the ASCII character.

John Saunders