COBOL really only has two data types: Numbers and strings.
The layout of each field in a COBOL record is precisely specified by a PICTURE
(usually abbreviated PIC
) clause. The most common ones are:
PIC X
for strings. PIC X(100)
means a 100-byte string.
PIC 9
for numbers, optionally with S
(sign) or V
(implicit decimal point). For example, PIC S9(7)V99
means a signed number with 7 digits to the left of the implicit decimal point and 2 digits to the right.
Numeric fields can have a USAGE
clause to optimize their storage. The most common USAGE
s are DISPLAY
, COMP
, and COMP-3
.
DISPLAY
stores each digit as a character. For example, PIC 9(4) VALUE 123
stores the number as if it were the string "0123". And PIC 9(4)V99 VALUE 123.45
stores it as "012345". Note that the decimal point is not actually stored.
This is an inefficient format in that it requires 8 bits to represent each digit. But it does have an "optimization" for signed numbers by using half of the last byte to store the sign. Normally, EBCDIC digits all have a high nybble of F, so 0123 is F0 F1 F2 F3. But -0123 is F0 F1 F2 D3; the D indicates negative. C means positive, and F means unsigned (i.e., positive). (Similar formats are used in ASCII versions of COBOL, but not as standardized.)
COMP-3
is binary-coded decimal with trailing sign nybble. PIC 9(3) COMP-3 VALUE 123
becomes the two bytes 12 3F.
COMP
or BINARY
is native binary format, just like short
, int
, or long
in C.