views:

300

answers:

5

I have found one example in Data and Communication Networking book written by Behrouza Forouzan regarding upper- and lowercase letters which differ by only one bit in the 7 bit code.

For example, character A is 1000001 (0x41) and character a is 1100001 (0x61).The difference is in bit 6, which is 0 in uppercase letters and 1 in lowercase letters. If we know the code for one case, we can easily find the code for the other by adding or subtracting 32 in decimal, or we can just flip the sixth bit.

What does all this mean?

I have found myself very confused with all these things. Could someone provide examples of how these things really work out?

+1  A: 

take a look, the 6th bit = 32, so if you flip it you subract or add 32

Bit value
1   1
2   2
3   4
4   8
5   16
6   32 (32 = hex 20)

Now if you look here http://asciitable.com/, you can see the ascii table for all the characters and will notice that A = 65 and a = 97

SQLMenace
Thanks for your answer but here you have missed something "a" is 97 not 96.96 is for Grave Accent i.e. `Sir, can you explain me all these stuff using a simple example and how actually these things are worked out.
Vibhakar SInha
I think this is a simple as it gets, the 6th bit means 32 so if it is on or off the difference is 32,which is the same difference between upper case and lower case characters
SQLMenace
Thanks Sir, it means when we have to convert lowercase to uppercase then we have to subtract the number by 32 and for uppercase to lowercase we have to add 32.Am i right, Sir.
Vibhakar SInha
Even better, to convert to lowercase, you do a bitwise OR with the value 32. To convert to upper case, do a bitwise AND with the value 223. Bitwise and's and or's are more efficient for the CPU then addition and subtraction. Also, the bit logic prevents you from needing to check the value (eliminating an IF check).
G Mastros
@G Mastros - Very good point about bit arithmetic
Emtucifor
A: 

http://asciitable.com/

0x61 is hexadecimal for 97 = a
0x41 is hexadecimal for 65 = A

So subtracting/adding decimal 32 is indeed the way to convert to uppercase/lowercase.

Z is 90 = 0b1111010    = 0x5A
z is 122 = 0b1011010   = 0x7A

Which is a difference of 0b01000000 in binary or 0x20 or 32 in decimal.

Thus switching the 6th bit changes case.

Gazler
+2  A: 

This relationship between the upper case and lower case letters was deliberate. When the ASCII code was formulated, computer hardware was primitive and software needed to conserve every byte. Flipping a single bit takes very little hardware or code to accomplish.

Mark Ransom
+7  A: 

Let's use a case that you'll find more familiar: base 10.

  1. Suppose we have a base 10 computer, where each 10bit stores a value from 0 to 9, and a 10byte is 10 10bits long, so that each byte can store 10,000,000,000 values (0 through 9,999,999,999).

  2. You wish to assign letters to particular positions in a 10byte so that this computer can communicate text data with other computers. One way you could do this would be like so:

    0000000101 A    0000000201 a
    0000000102 B    0000000202 b
    0000000103 C    0000000203 c
    0000000104 D    0000000204 d
    0000000105 E    0000000205 e
    0000000106 F    0000000206 f
    0000000107 G    0000000207 g
    0000000108 H    0000000208 h
    0000000109 I    0000000209 i
    0000000110 J    0000000210 j
    0000000111 K    0000000211 k
    0000000112 L    0000000212 l
    0000000113 M    0000000213 m
    0000000114 N    0000000214 n
    0000000115 O    0000000215 o
    0000000116 P    0000000216 p
    0000000117 Q    0000000217 q
    0000000118 R    0000000218 r
    0000000119 S    0000000219 s
    0000000120 T    0000000220 t
    0000000121 U    0000000221 u
    0000000122 V    0000000222 v
    0000000123 W    0000000223 w
    0000000124 X    0000000224 x
    0000000125 Y    0000000225 y
    0000000126 Z    0000000226 z
    
  3. Do you see that each lower case letter differs from the upper case letter by only a single 10bit digit, in the 3rd column from the right? It didn't have to be designed this way. It was just convenient, because then any time we want to adjust the case of a letter we can just modify one of the digits (10bits) without caring what the rest of the number is or bothering with 26 different transformations when we can just do a single one.

  4. Now, in base 2 it is exactly the same, but instead of each bit representing 0-9, it can only represent 0-1. Using 8 2-bits gives us only 256 possible combinations, 0-255. Showing the ASCII codes for the upper and lower case letters in binary looks like this:

    01000001 A        01100001 a
    01000010 B        01100010 b
    01000011 C        01100011 c
    01000100 D        01100100 d
    01000101 E        01100101 e
    01000110 F        01100110 f
    01000111 G        01100111 g
    01001000 H        01101000 h
    01001001 I        01101001 i
    01001010 J        01101010 j
    01001011 K        01101011 k
    01001100 L        01101100 l
    01001101 M        01101101 m
    01001110 N        01101110 n
    01001111 O        01101111 o
    01010000 P        01110000 p
    01010001 Q        01110001 q
    01010010 R        01110010 r
    01010011 S        01110011 s
    01010100 T        01110100 t
    01010101 U        01110101 u
    01010110 V        01110110 v
    01010111 W        01110111 w
    01011000 X        01111000 x
    01011001 Y        01111001 y
    01011010 Z        01111010 z
    

    Just the same as before, they differ by only one 2bit digit, in the 6th column from the right.

  5. Just to fill things out a little, here's an example of what those binary values mean. Let's take the one for G. To understand what value 01001001 is in decimal:

     Pos:   7  6  5  4  3  2  1  0
     Bit:   0  1  0  0  0  1  1  1
     Val: 128 64 32 16  8  4  2  1
    Mult:   0 64  0  0  0  4  2  1
     Add: 64 + 4 + 2 + 1 = 71, which is the ASCII code for G.
    

    Doing the same thing for the letter G in the special base 10 system I constructed above (I left out the values for 4 - 9, but each higher one is 10 times the previous):

      Pos: 9   8   7   6   5   4   3   2   1   0
    10Bit: 0   0   0   0   0   0   0   1   0   7
      Val:                      1000 100  10   1
     Mult: 0   0   0   0   0   0   0 100   0   7
      Add: 100 + 7 = 107, which is my special 10ASCII code for G.
    

    Look back at the "Val" row for binary. Do you see that starting from the right, each value is double the previous one? Doubling each time we get 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and so on. This is how a binary digit's position determines its value, just like a decimal digit's position determines its value with powers of 10: 1, 10, 100, 1000, 10000, 100000, and so on.

    I realize this seems silly because all I did was convert 107 to 107... but 107 isn't just a number, it is a shorthand form for:

    1 hundreds + 0 tens + 7 ones.
    

    Another way we could represent that is

    1x10^2 + 0x10^1 + 7x10^0.
    

    Similarly, 01000111 isn't just a binary number, it is a shorthand form for

    0x2^7 + 1x2^6 + 0x2^5 + 0x2^4 + 0x2^3 + 1x2^2 + 1x2^1 + 1x2^0
    

    Which is what I already showed you:

    0 + 64 + 0 + 0 + 0 + 4 + 2 + 1
    = 64 + 4 + 2 + 1
    = 71
    

I hope this is sufficient for you to understand a little bit more about binary and ASCII codes.

Note 1: The reason for 8 bits in a byte instead of 2 as you might think is that back in the early days of computing, it was decided that 8 is a much more useful number of bits, as a 2 bit 2byte would only encode 4 values. To transmit the upper and lower case letters of the alphabet alone would require 3 bytes! There is nothing inherent in binary that forces the choice of 8 bytes, except that 8 is also a power of 2 which makes a lot of the math involved in working with binary information simpler and things align on edges better. If they had chosen 6 bits per byte, I am sure that something somewhere would have come out awkward and not making good use of the full range of values available.

Note 2: Also, my system of 10 bits in a 10byte is actually very impractical. A more reasonable size would be maybe 5 10bits per 10byte, encoding 100,000 values. It is not a power of 10 but at least 10 is evenly divisible by it which would undoubtedly be useful.

Emtucifor
01010111 01010100 01000110 00111111
Lukman
@Lukman - Is that a criticism or a compliment? :)
Emtucifor
+2  A: 

In order to add or subtract 32, you first must know whether the character is greater or less than 'A'.

When this book was written, the programming languages most people were using did not have Strings, or .equalsIgnoreCase. This was pre-i18n, and when a business had a server, you would telnet to it (like xterm), and get a command line menu. What he's describing, was typically used to create a nice case-insensitive menu for your users, taking advantage of the numeric layout of the ascii table.

It can be very fast, because there are bit-wise assembler instructions to do the math in either direction, regardless of whether the characters are already upper or lowercase.

c = c | 32 // to uppercase

c = c & (1+2+4+8+16+ 0 +64+128) // to lowercase

Say you had a Java-like language, without objects or the standard libs. Your networking author is prompting you to code like this:

    public static void main()
    {
        println("What would you like to do?");
        println("Inventory (inv)");
        println("Reports (rep)");

        char[] ca = readUserInput();        
        for (int i = 0; i < ca.length; i++)
            ca[i] = ca[i] | 32;  // convert to uppercase, by ensuring bit 32 is set

        if (compareInput(ca, "INV") == true)
            doInventory();
    }

Have you tried searching Google, and sometimes capitalized a person's name?

Brian Maltzan