views:

846

answers:

5

I have a files containing both duplicate and non-duplicate records. The file is already sorted by a key. I want to determine those records that are duplicate and records that are non-duplicate. If duplicate the record is moved to a duplicate file and if non-duplicate that will be moved to valid file.

I am using COBOL, and occurence of records for duplicate and non-duplicate is more than twice.

A: 

For each record in the sorted file: read it and examine the sort key.

If it's different from the previous record's key (or an initialized value), write the record to the "valid" file and remember the sort key to compare the next record against.

If it's not, write the record to the "duplicate" file.

Albert Visser
A: 

You could use the COBOL SORT verb with the OUTPUT PROCEDURE clause. The output procedure could compare the previous record's (saved) key with the current record and write it to the current record to the valid or duplicate file. This avoids a separate loop to read the sorted file. Here's an example of using an output procedure:

http://web.sxu.edu/~rogers/cobol/sort3.html

Dave Smith
A: 

There's a way to do this that involves no code writing at all, and will probably run faster than anything you would write. The IBM ICTEOOL utility is a wrapper for DFSORT and can do this quite easily. Here's a sample job step that will put all duplicate records on DUPES, and the non-duplicates on NONDUPES. If you want to uniquify the duplicate file, run it through a simple sort with SUM FIELDS=NONE afterward.

//ICE  EXEC PGM=ICETOOL
//TOOLIN   DD  *
 SELECT FROM INFILE TO(NODUPES) ON(2,10,CH) NODUPS DISCARD(DUPES)
/*
//INFILE   DD DSN=MY.INFILE,DISP=SHR
//NODUPES  DD DSN=MY.OUTFILE.NODUPES,DISP=(NEW,CATLG,DELETE),LIKE=MY.INFILE
//DUPES    DD DSN=MY.OUTFILE.DUPES,DISP=(NEW,CATLG,DELETE),LIKE=MY.INFILE
//TOOLMSG  DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*

This assumes your sort key is in position 2, for a length of 10. This program should be available in your standard LPA load datasets, so no JOBLIB/STEPLIB override is needed.

Jeff Shattock
A: 

If you're using SYNCSORT, or a similar DFSSORT utility, you can accomplish both tasks in a single sort pass by specifying:

SUM FIELDS=NONE,XSUM

To capture your duplicate records, you will need to include an additional DD allocation:

//SORTXSUM DD DSN=...

The normal SORTOUT will contain unique records and SORTXSUM will contain those that duplicated.

MikeC