tags:

views:

411

answers:

4
+4  Q: 

COBOL Data types

Hi,

I have confusion regarding COBOL data types. Like in many interviews it is asked to explain the difference between COMP-3 and COMP... what is the exact difference? what is the meaning of usage modes in COBOL and how is it related to data types?

Thanks and Regards, Manasi Kulkarni

+6  A: 

USAGE in COBOL describes how a data item is to be used. A few examples of USAGE are:

  • DISPLAY. This identifies an item that may be printed on a terminal or report. This may or may not be a number (e.g. could be a text value). The description of the DISPLAY item is given by the PICture clause. For example: PIC 9(5) USAGE DISPLAY describes a 5 digit number that may be displayed (printed). Often USAGE DISPLAY is left off because it is implied if missing.
  • INDEX. This identifies an item used as an index into a table (OCCURS).
  • COMPsomething indicates that the data item is to be used in arithmetic operations (i.e. it is a number of some type).

There are various types of numeric item. Two of the most commonly used numeric data types are:

  • COMPUTATIONAL or COMP. This is equivalent to BINARY
  • COMPUTATIONAL-3 or COMP-3. This is equivalent to PACKED-DECIMAL

COMP (BINARY) data items are generally the most efficient way to perform calculations on data items that represent integer values.

COMP-3 (PACKED-DECIMAL) data items are used in COBOL because they maintain a fixed number of decimal points. All computations lead to a result having the prescribed number of decimal points. This is particularly useful in accounting type operations. Floating point numbers make the number of digits after the decimal point variable (e.g. the decimal point can "float") which is not the way financial operations are usually represented.

You can find a complete list of COMPutational items for IBM Enterprise COBOL here

One of the problems many programmers have when beginning with COBOL is understanding that a COMP item is great for doing math but cannot be displayed (printed) until it is converted into a DISPLAYable item through a MOVE statement. If you MOVE a COMP item into a report or onto a screen it will not present very well. It needs to be moved into a DISPLAY item first.

The other thing that you may want to research a bit more is the relationship between the PICture and the USAGE when defining variables in COBOL. Here is a link to a very good introductory COBOL Tutorial from the University of Limerick.

NealB
How many COMP type of vaiables do we have? I was under impression we only have COMP and COMP3... from which COMP is binary storage and COMP 3 is packed decimal storage. and from the replies to my question I got that these data types differ in storage that is memory it takes to store the data. what is COMP-5?
Manasi
@Manasi For IBM Enterprise COBOL there are 5 distinct COMPUTATIONAL items, COMP-1 through COMP-5. I provided a link to the IBM manual describing these in my original post - you should review it. Note that some computational types have multiple names (e.g. COMP/BINARY and COMP-3/PACKED-DECIMAL). Each COBOL vendor supports a similar set of COMP-x items (there may be vendor differences in the way rounding, precision and truncation are handled). Some vendors (eg. RM) provided a COMP-6 item. COMP-5 is a native binary format having 2, 4 or 9 bytes of storage.
NealB
oops... last sentence should read: 2, 4, or 8 bytes of storage.
NealB
+1  A: 

COBOL really only has two data types: Numbers and strings.

The layout of each field in a COBOL record is precisely specified by a PICTURE (usually abbreviated PIC) clause. The most common ones are:

  • PIC X for strings. PIC X(100) means a 100-byte string.
  • PIC 9 for numbers, optionally with S (sign) or V (implicit decimal point). For example, PIC S9(7)V99 means a signed number with 7 digits to the left of the implicit decimal point and 2 digits to the right.

Numeric fields can have a USAGE clause to optimize their storage. The most common USAGEs are DISPLAY, COMP, and COMP-3.

DISPLAY stores each digit as a character. For example, PIC 9(4) VALUE 123 stores the number as if it were the string "0123". And PIC 9(4)V99 VALUE 123.45 stores it as "012345". Note that the decimal point is not actually stored.

This is an inefficient format in that it requires 8 bits to represent each digit. But it does have an "optimization" for signed numbers by using half of the last byte to store the sign. Normally, EBCDIC digits all have a high nybble of F, so 0123 is F0 F1 F2 F3. But -0123 is F0 F1 F2 D3; the D indicates negative. C means positive, and F means unsigned (i.e., positive). (Similar formats are used in ASCII versions of COBOL, but not as standardized.)

COMP-3 is binary-coded decimal with trailing sign nybble. PIC 9(3) COMP-3 VALUE 123 becomes the two bytes 12 3F.

COMP or BINARY is native binary format, just like short, int, or long in C.

dan04
Thanks for the answers. I want to know the factors for deciding the data types suitable for different scenarios. like the memory consumption for each data type is different. COMP will take I guess 4 bytes of memory and COMP-3 takes (digits/2)+1 bytes.
Manasi
COMP uses the smallest data type that will hold all the digits, but often it has to be a power of two. So, if 16-, 32-, and 64-bit types are available, then 1-4 digits take 2 bytes, 5-9 digits take 4 bytes, and 10-18 digits take 8 bytes. This makes COMP-3 optimal for fields with 1, 5, or 10-13 digits.
dan04
As for deciding which data type to use, I wouldn't know. I don't actually *write* COBOL. I'm a C++ programmer who had to learn to read COBOL layouts in order to pass data to programs on our mainframe.
dan04
+1  A: 

As other reply suggests, COMP means big endian binary. COMP-3 is packed decimal- which means one decimal digit is mapped to each nibble.

I am not sure the previous reply got the issue around precision correct though.

PIC S9(9)V9(9) COMP and PIC S9(9)V9(9) COMP-3

Have exactly the same precision. That is part of the ANSI85 standard. It is the job of the compiler and runtime to ensure that the binary representation in the COMP has the appropriate transformations placed upon it to ensure exactly the same results are achieved as would be if usage was display or COMP-3.

IBM mainframe computers have packed decimal calculations in hardware. This is very helpful, because the conversion of decimal to binary scales as n squared n is the length of the number. This means that COMP-3 is every often the fastest format of the mainframe, but is less likely to be on distributed systems. However, this again is not always the case. For example, the Micro Focus native COBOL solution will tend to be faster in COMP-3 than COMP-5 for very large decimal precision (>18 digits) but the reverse for otherwise. The Managed COBOL system from Micro Focus is almost always fastest in COMP (actually, COMP-5 is the best - which is similar to COMP but will have hardware endian rather than enforcing big-endian memory layout).

Finally, my I suggest that for intermediate values and general mathematics, the newer data definitions of binary-long and binary-double are a better choice because then the compiler can make the decisions about how to store and optimize for you.

For more on COBOL on distributed and Managed COBOL check out this knol: http://knol.google.com/k/alex-turner/micro-focus-managed-cobol/2246polgkyjfl/4 and also feel free to look up cobol on facebook :)

alex turner
You should be warned that there is variance among COBOL compilers with respect tohow precision is handled for binary data types, particularlywhen truncation occurs. Understandinghow intermidiate results in complex calculations are managed isalso a fairly complex subject. For example, see [Intermediate Results](http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/IGY3PG31/APPENDIX1.1?DT=20060329003636)as they apply to IBM Enterprise COBOL.
NealB
+1 for mentioning that IBM mainframe computers have packed decimal calculations in hardware. Packed decimal was originally an IBM enhancement to Cobol.
Gilbert Le Blanc
For COMP-3 COMP and COMP-5 there should not be any variance because there intermediate results are defined in the standard. However, different compiler vendors do have their own extensions. COMP-1 and COMP-3 are poorly defined though.
alex turner
@alex turner. Agreed for COMP-3 and COMP-5, but beware of COMP/BINARY items because, at least for IBM Enterprise COBOL, the TRUNC(BIN/OPT/STD) compiler option affects how truncation is managed.
NealB
+1  A: 

As for deciding which data type to use, it can be made very complicated - BUT - a simple set of guidelines are:

DISPLAY and Edited Zone Decimal should only be used for displaying numerics in a report or sysout. Move COMP and COMP-3 fields to a DISPLAY/Edited field before putting it in a report or to sysout.

COMP - has the fastest calculation speed for integers

COMP-3 (PACKED Decimal) - should be used when decimal positions should be maintained.

COMP and COMP-3 fields can be used together in calculations. The compiler will determeine which field type will be converted (under the covers) to a single common numeric data type - rules based.