views:

1098

answers:

8

The input file have records as: 8712351,8712353,8712353,8712354,8712356,8712352,8712355 8712352,8712355

Using COBOL I need to remove duplicates from the above file and write to an output file. I wrote simple logic to read records and write to an output file.

Where do I need to put the logic of removing duplicates (say ,8712353,8712352) from the above file. Here is the program logic:

   IDENTIFICATION DIVISION.
   PROGRAM-ID.RemoveDup.
   ENVIRONMENT DIVISION.
   INPUT-OUTPUT SECTION.
   FILE-CONTROL.
   SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt'
           ORGANIZATION IS LINE SEQUENTIAL.
   SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt'
               ORGANIZATION IS LINE SEQUENTIAL.

   DATA DIVISION.

   FILE SECTION.
   FD INPUTFILEDUP.
   01 INPUTFILEDUPREC.
       88 EOFINPUTFILEDUP    VALUE HIGH-VALUES.
       02 INPUTFILEID        PIC 9(07).

   FD  OUTFILEDUP.
   01 OUTFILEDUPREC         PIC 9(07).

   WORKING-STORAGE SECTION.
   77 WS-VARIABLE            PIC 9(09).
   77 REC-NOT-MATCH          PIC 9(01).
   77 CUR-VARIABLE           PIC 9(09).

   PROCEDURE DIVISION.
   BEGIN.
   OPEN INPUT  INPUTFILEDUP
   OPEN OUTPUT OUTFILEDUP

   READ INPUTFILEDUP
       AT END SET EOFINPUTFILEDUP  TO TRUE
   END-READ
   PERFORM UNTIL (EOFINPUTFILEDUP)
                WRITE OUTFILEDUPREC  FROM  INPUTFILEID
               READ  INPUTFILEDUP
                     AT END SET EOFINPUTFILEDUP TO TRUE
                           PERFORM UNTIL (EOFINPUTFILEDUP)
  END-READ
  END-PERFORM
                   CLOSE   INPUTFILEDUP
                   CLOSE  OUTFILEDUP
  STOP RUN.
+2  A: 

When Organization is Sequential, the record deleted is the last record read. The Delete statement is valid only when the last operation against the file is a successful Read statement. If not, the Delete returns a File Status value of 43. Because a Delete cannot return File Status values beginning with a 2 when the file is Open with Sequential Access, coding Invalid Key on such a Delete is not allowed.

When Dynamic or Random access is selected for the file, the Delete statment, like the Rewrite, becomes a little less restrictive. The record being deleted need not have bene previously read. Simply fill in the primary Key information in the record description for the fle and issue the Delete statement. If the record does not exist, a File Status of 23 is returned and an Invalid Key condition exists.

From page 274 of

Sams Teach Yourself COBOL in 24 Hours

page 274 (which I have just dusted down from off my bookshelf). So in your case you'll presumably set up your records to be sorted by INPUTFILEID, make a record as you go through of occurences of a given INPUTFILEID past its first occurence, and Delete accordingly (after you have written it to your output file).

davek
+1  A: 

If you will sort the file with an external sort prior to reading it in the cobol program you can remove the duplicates with the SORT keyword EQUALS. If you sort the file prior to the cobol program and do not drop duplicates then a simple IF statement and a save field will allow you to delete the dups.

Set up a INPUTFILEID-save field. Right after the read.... IF inputfileid equal inputfileid-save read again if not write... after the write move inputfileid to inputfileid-save. You will have to break up the current perform to do this.

If you do not fully understand what I am saying and will help you change the code just let me know

Vince Manso
A: 

Hi Vince, I sorted the Input file in ascending order as : 8712351,8712353,8712353,8712354,8712356,8712352,8712355,8712352,8712355 And it worked, below is the modified code:

But suppose if my file is not in either ascending or decending order the where i need to write the sort logic before removing dups. please can you update my below code for this as i tried but not successfull in doing this if the input fiel structure is like:

8712351,8712353,8712353,8712354,8712356,8712352,8712355,8712352,8712355

   IDENTIFICATION DIVISION.
   PROGRAM-ID.RemoveDup2.
   ENVIRONMENT DIVISION.
   INPUT-OUTPUT SECTION.
   FILE-CONTROL.
   SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt'
           ORGANIZATION IS LINE SEQUENTIAL.
   SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt'
               ORGANIZATION IS LINE SEQUENTIAL.

   DATA DIVISION.

   FILE SECTION.
   FD INPUTFILEDUP.
   01 INPUTFILEDUPREC.
       88 EOFINPUTFILEDUP    VALUE HIGH-VALUES.
       02 INPUTFILEID        PIC 9(07).

   FD  OUTFILEDUP.
   01 OUTFILEDUPREC         PIC 9(07).

   WORKING-STORAGE SECTION.
   77 WS-VARIABLE            PIC 9(09) VALUE ZERO.
   77 REC-NOT-MATCH          PIC 9(01).
   77 CUR-VARIABLE           PIC 9(7) VALUE ZERO.

   PROCEDURE DIVISION.
   BEGIN.
   OPEN INPUT  INPUTFILEDUP
   OPEN OUTPUT OUTFILEDUP

   READ INPUTFILEDUP
       AT END SET EOFINPUTFILEDUP  TO TRUE
   END-READ
   PERFORM UNTIL (EOFINPUTFILEDUP)
        IF INPUTFILEID NOT EQUAL TO  WS-VARIABLE
              MOVE  INPUTFILEID TO WS-VARIABLE
              WRITE OUTFILEDUPREC  FROM  INPUTFILEID
              READ  INPUTFILEDUP
                  AT END SET  EOFINPUTFILEDUP TO TRUE
              PERFORM UNTIL (EOFINPUTFILEDUP)
        ELSE
              DISPLAY "dUPLICATE FOUND"   INPUTFILEID

   READ INPUTFILEDUP
     AT END SET EOFINPUTFILEDUP  TO TRUE

   END-READ

       END-PERFORM

   CLOSE   INPUTFILEDUP
   CLOSE  OUTFILEDUP
   STOP RUN.
Sanjana
+3  A: 

Finally It worked. Here is the code

   IDENTIFICATION DIVISION.
   PROGRAM-ID.RemoveDup2.
   ENVIRONMENT DIVISION.
   INPUT-OUTPUT SECTION.
   FILE-CONTROL.
   SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt'
           ORGANIZATION IS LINE SEQUENTIAL.
   SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt'
               ORGANIZATION IS LINE SEQUENTIAL.
   SELECT WorkFile ASSIGN TO "WORK.TMP".

   DATA DIVISION.

   FILE SECTION.
   FD INPUTFILEDUP.
   01 INPUTFILEDUPREC.
       88 EOFINPUTFILEDUP    VALUE HIGH-VALUES.
       02 INPUTFILEID        PIC 9(07).

   FD  OUTFILEDUP.
   01 OUTFILEDUPREC         PIC 9(07).

   SD WorkFile.
   01 WORKREC.
      02 WINPUTFILEID       PIC 9(07).

   WORKING-STORAGE SECTION.
   77 WS-VARIABLE            PIC 9(09) VALUE ZERO.
   77 REC-NOT-MATCH          PIC 9(01).
   77 CUR-VARIABLE           PIC 9(7) VALUE ZERO.

   PROCEDURE DIVISION.
   BEGIN.
       SORT WorkFile ON ASCENDING KEY WINPUTFILEID
       USING INPUTFILEDUP GIVING INPUTFILEDUP

   OPEN INPUT  INPUTFILEDUP
   OPEN OUTPUT OUTFILEDUP

       READ INPUTFILEDUP
               AT END SET EOFINPUTFILEDUP  TO TRUE
   END-READ
       PERFORM UNTIL (EOFINPUTFILEDUP)
           IF INPUTFILEID NOT EQUAL TO  WS-VARIABLE
                   MOVE  INPUTFILEID TO WS-VARIABLE
                   WRITE OUTFILEDUPREC  FROM  INPUTFILEID
                   READ  INPUTFILEDUP
                       AT END SET  EOFINPUTFILEDUP TO TRUE
       PERFORM UNTIL (EOFINPUTFILEDUP)
           ELSE
                   DISPLAY "DUPLICATE FOUND    "   INPUTFILEID

   READ INPUTFILEDUP
               AT END SET EOFINPUTFILEDUP  TO TRUE
   END-READ
   END-PERFORM

   CLOSE   INPUTFILEDUP
   CLOSE  OUTFILEDUP

   STOP RUN.
Sanjana
A: 

COBOL! Someone is older than me! I used to know a doctor, had a double specialty, urology and ob/gyn. Had a special interest and busy practice with high risk pregnancy at the best teaching hospital in the world, Mac. He started life as a COBOL programmer, but left because of the stress.

Harviej
A: 

Amazing ! Up for COBOL use as a scripting language ;)

FYI the solution in Python would be :

open( outfile, 'w' ).write( ','.join( sorted( set( open( infile ).read().split(',') )))

this is for a small dataset (say, < 100 million values) where using a hashtable is possible.

peufeu
Deserves to be a comment, but FTR i upvoted.
RCIX
+1  A: 

sort is standard for these os-close jobs to follow DRY principle, gears -t for separator and -u for uniques. It's C.

LarsOn
A: 

The suggested fix is still not the one which I want. Rather than writing the sorting logic in the file I need search to happen like pick the first record search in the entire file and if there is a duplicate, display the duplicate, otherwise write to the file. Then again place the pointer to the next record search in file start till the end and if there is a duplicate display otherwise write to file. Any code logic for this??

sanjana