views:

246

answers:

6

This Cobol question really piqued my interest because of how much effort seemed to be involved in what seems like it would be a simple task.

Some sloppy Python to remove file duplicates could be simply:

print set(open('testfile.txt').read().split('\n'))

How does removing duplicates in the same file structure as above in your language of choice compare to COBOL?

A: 

Haskell

Your Python example assumes a single word per line and does not preserve the order of the words in the input. The same holds for this code (this reads from stdin, though):

import Data.Set (fromList, toList)

main :: IO ()
main = interact (unlines . toList . fromList . lines)

interact applies a single function to standard input. lines splits the input at newlines. Data.Set.fromList converts a list to a set (thus removing the duplicates). Data.Set.toList converts the set back to a list. unlines puts the newlines back.

Alternatively, we can use the much less efficient Data.List.nub:

import Data.List (nub)

main :: IO ()
main = interact (unlines . nub . lines)
Stephan202
+1  A: 

Unix shell

sort -u file

sort is specifically designed to do exactly this task. uniq is also designed to do this, assuming the non-unique lines follow each other in sequence. Like the Python example, order is not preserved. Using uniq, the order would be preserved.

Welbog
A: 

Swi-Prolog

Uses read_line_to_codes/2 to read lines from stdin, one at a time. Duplicate lines are removed using list_to_set/2. The resulting list is written back using writef/2, again one line at a time.

read_lines([H|T]) :-
  read_line_to_codes(user_input, H), H \= end_of_file, read_lines(T).
read_lines([]).

write_lines([]).
write_lines([H|T]) :-
  writef("%s\n", [H]), write_lines(T).

main :-
  read_lines(X), list_to_set(X, Y), write_lines(Y).
Stephan202
A: 

Java

Uses an InputStreamReader and BufferedReader to read lines from standard input. The lines are stored in a HashSet, after which they are written to standard output using a for-each loop.

import java.io.*;
import java.util.*;

public class Uniq {
  public static void main(String[] args) throws IOException {
    BufferedReader r = new BufferedReader(new InputStreamReader(System.in));
    Set<String> s = new HashSet<String>();

    for (String l; (l = r.readLine()) != null; s.add(l));

    for (String l : s) {
      System.out.println(l);
    }
  }
}
Stephan202
A: 

Groovy

A direct translation of the python version.

println new HashSet(new File('testfile.txt').readLines()).join('\n')
ataylor
A: 

Python

  1. Python 2.x: reads all lines from the standard input and removes duplicates by putting them in a set. After that the lines are joined together again and printed to the standard output.

    import sys 
    print ''.join(set(sys.stdin.readlines())),
    
  2. Python 3.x: reads all lines from the standard input and removes duplicates by putting them in a set. After that the lines are joined together again and printed to the standard output.

    import sys 
    print(''.join(set(sys.stdin.readlines())), end='')
    
Stephan202