views:

893

answers:

17

Given the following list of presidents do a top ten word count in the smallest program possible:

INPUT FILE

    Washington
    Washington
    Adams
    Jefferson
    Jefferson
    Madison
    Madison
    Monroe
    Monroe
    John Quincy Adams
    Jackson
    Jackson
    Van Buren
    Harrison 
    DIES
    Tyler
    Polk
    Taylor 
    DIES
    Fillmore
    Pierce
    Buchanan
    Lincoln
    Lincoln 
    DIES
    Johnson
    Grant
    Grant
    Hayes
    Garfield 
    DIES
    Arthur
    Cleveland
    Harrison
    Cleveland
    McKinley
    McKinley
    DIES
    Teddy Roosevelt
    Teddy Roosevelt
    Taft
    Wilson
    Wilson
    Harding
    Coolidge
    Hoover
    FDR
    FDR
    FDR
    FDR
    Dies
    Truman
    Truman
    Eisenhower
    Eisenhower
    Kennedy 
    DIES
    Johnson
    Johnson
    Nixon
    Nixon 
    ABDICATES
    Ford
    Carter
    Reagan
    Reagan
    Bush
    Clinton
    Clinton
    Bush
    Bush
    Obama

To start it off in bash 97 characters

cat input.txt | tr " " "\n" | tr -d "\t " | sed 's/^$//g' | sort | uniq -c | sort -n | tail -n 10

Output:

      2 Nixon
      2 Reagan
      2 Roosevelt
      2 Truman
      2 Washington
      2 Wilson
      3 Bush
      3 Johnson
      4 FDR
      7 DIES

Break ties as you see fit! Happy fourth!

For those of you who care more information on presidents can be found here.

+7  A: 

vim 60

    :1,$!tr " " "\n"|tr -d "\t "|sort|uniq -c|sort -n|tail -n 10
ojblass
+1 - It's not often that vim gets in on Code Golf.
Chris Lutz
+1 F*CK YEAH !!!
Tom
`:1,$!` can be replaced by `:%!`, can't it?
ephemient
+2  A: 

Perl: 90

Perl: 114 (Including perl, command-line switches, single quotes and filename)

perl -nle'$h{$_}++for split/ /;END{$i++<=10?print"$h{$_} $_":0for reverse sort{$h{$a}cmp$h{$b}}keys%h}' input.txt
Alan Haggai Alavi
I thought for sure Perl would be number one...
ojblass
A few easy tricks shrink the whole command to 84: perl -ne'$_{$_.$/}++for+split}print+(sort{$_{$b}<=>$_{$a}}keys%_)[0..9];{' input.txt
ephemient
+2  A: 

Here's a compressed version of the shell script, observing that for a reasonable interpretation of the input data (no leading or trailing blanks) that the second 'tr' and the 'sed' command in the original do not change the data (verified by inserting 'tee out.N' at suitable points and checking the output file sizes - identical). The shell needs fewer spaces than humans do - and using cat instead of input I/O redirection wastes space.

tr \  \\n<input.txt|sort|uniq -c|sort -n|tail -10

This weighs in at 50 characters including newline at end of script.

With two more observations (pulled from other people's answers):

  1. tail on its own is equivalent to 'tail -10', and
  2. in this case, numeric and alpha sorting are equivalent,

this can be shrunk by a further 7 characters (to 43 including trailing newline):

tr \  \\n<input.txt|sort|uniq -c|sort|tail

Using 'xargs -n1' (with no command prefix given) instead of 'tr' is extremely clever; it deals with leading, trailing and multiple embedded spaces (which this solution does not).

Jonathan Leffler
Good work... always different ways to think about stuff...
ojblass
+11  A: 

A shorter shell version:

xargs -n1 < input.txt | sort | uniq -c | sort -nr | head

If you want case insensitive ranking, change uniq -c into uniq -ci.

Slightly shorter still, if you're happy about the rank being reversed and readability impaired by lack of spaces. This clocks in at 46 characters:

xargs -n1<input.txt|sort|uniq -c|sort -n|tail

(You could strip this down to 38 if you were allowed to rename the input file to simply "i" first.)

Observing that, in this special case, no word occur more than 9 times we can shave off 3 more characters by dropping the '-n' argument from the final sort:

xargs -n1<input.txt|sort|uniq -c|sort|tail

That takes this solution down to 43 characters without renaming the input file. (Or 35, if you do.)

Using xargs -n1 to split the file into one word on each line is preferable to the tr \ \\n solution, as that creates lots of blank lines. This means that the solution is not correct, because it misses out Nixon and shows a blank string showing up 256 times. However, a blank string is not a "word".

Stig Brautaset
outstanding... you are really sick...
ojblass
Using xargs is clever - it works even if the data is laced with leading and trailing blanks. And it's a good observation that 'tail' alone prints the last ten lines of output (I'd forgotten); that saves 4 more characters.
Jonathan Leffler
+1 for clever use of xargs
Laurence Gonsalves
Is there a way to cheat and use xargs inside of vim and lose the filename length?
ojblass
I cheated... forgive me
ojblass
+7  A: 

Vim 36

:%s/\W/\r/g|%!sort|uniq -c|sort|tail
Josef
Nice! Work there...
ojblass
given this input it works but doesn't this run the danger of lexical sorting of numbers?
ojblass
absolutely. but /given this input/, I can do without those extra three characters :)
Josef
You can lose 4 characters because 'tail' is equivalent to 'tail -10' or 'tail -n10'.
Jonathan Leffler
ha! of course, thx. removed it and put numeric sorting back in for ojblass
Josef
hmmm you're missing a colon... I hate when I miss a colon
ojblass
(: allright, put colon back in
Josef
The '-n' (and space) isn't necessary in this example because all the counts are single digits and lexical <==> numeric sorting when that's the case.
Jonathan Leffler
that's what I had originally until the OP complained but I agree with you so I removed it
Josef
I love this but the xargs guy gets it... you get um seconds place... Happy fourth!
ojblass
This is clever, but unfortunately gives the same wrong output as the solution to use tr - it creates 256 blank lines and erroneously shows these ase a word while missing out "Nixon" from the results.
Stig Brautaset
+2  A: 

My best try with ruby so far, 166 chars:

h = Hash.new
File.open('f.l').each_line{|l|l.split(/ /).each{|e|h[e]==nil ?h[e]=1:h[e]+=1}}
h.sort{|a,b|a[1]<=>b[1]}.last(10).each{|e|puts"#{e[1]} #{e[0]}"}

I am surprised that no one has posted a crazy J solution yet.

Justanotheraspiringdev
you could replace the first line with h = {}
Cuervo's Laugh
Also, you can replace the .each_line bit giving you this for a first line:File.open('f.1').each {|l|l.split(/ /).each{|e|h[e]==nil ?h[e]=1:h[e]+=1}}saves you 4 characters
Cuervo's Laugh
+12  A: 

C#, 153:

Reads in the file at p and prints results to the console:

File.ReadLines(p)
    .SelectMany(s=>s.Split(' '))
    .GroupBy(w=>w)
    .OrderBy(g=>-g.Count())
    .Take(10)
    .ToList()
    .ForEach(g=>Console.WriteLine(g.Count()+"|"+g.Key));

If merely producing the list but not printing to the console, it's 93 characters.

6|DIES
4|FDR
3|Johnson
3|Bush
2|Washington
2|Adams
2|Jefferson
2|Madison
2|Monroe
2|Jackson
Jason
My major complaint with Java and CSharp are their verbosity... could you shorten it up with some equivalent of using...?
ojblass
That's pretty neat and tidy, I thought. And at least semi-comprehensible, at least compared with the Perl version.
Jonathan Leffler
'ReadLines' should be 'ReadAllLines'. Also, it can be a little shorter if you remove .ToList() as you can foreach over the IEnumerable that Take returns. So: foreach (var v in File.ReadAllLines(p) .SelectMany(s => s.Split(' ')) .GroupBy(w => w) .OrderBy(g => -1 * g.Count()) .Take(10)) Console.WriteLine(v.Count() + "|" + v.Key);
JulianR
@JulianR: ReadLines seems to be valid as such a function is listed under .NET 4.0: http://msdn.microsoft.com/en-us/library/dd383503%28VS.100%29.aspx
Ahmad Mageed
@ Ahmad - Weird. So what's the difference between ReadAllLines and ReadLines and why did they add this in .NET 4.0? Is the only distinction that one returns IENumerable<string> and the other string[]?
JulianR
@JulianR: See Lippert here: http://blogs.msdn.com/ericlippert/archive/2008/09/22/arrays-considered-somewhat-harmful.aspx
Jason
Jonathan: code golf is not about readability, it is sport
Alexandr Ciornii
+2  A: 

python 3.1 (88 chars)

import collections
collections.Counter(open('input.txt').read().split()).most_common(10)
SilentGhost
Counter resides in collections, not itertools. This also does not print the output and its in the reversed order compared to the output of the original question.
truppo
yeah, that was a typo. but I see no point in satisfying all whims of OP. Why is it in ascending and not descending order? it prints when run in interpreter.
SilentGhost
code golf doesn't allows using modules
Alexandr Ciornii
hmmm, says who?
SilentGhost
+2  A: 

Python 2.6, 104 chars:

l=open("input.txt").read().split()
for c,n in sorted(set((l.count(w),w) for w in l if w))[-10:]:print c,n
truppo
you don't need re there at all. have a look at my answer.
SilentGhost
+2  A: 

vim 38 and works for all input

:%!xargs -n1|sort|uniq -c|sort -n|tail
ojblass
+2  A: 

The lack of AWK is disturbing.

xargs -n1<input.txt|awk '{c[$1]++}END{for(p in c)print c[p],p|"sort|tail"}'

75 characters.

If you want to get a bit more AWKy, you can forget xargs:

awk -v RS='[^a-zA-Z]' /./'{c[$1]++}END{for(p in c)print c[p],p|"sort|tail"}' input.txt
Nick Presta
get your awk on! er thats kind of scary... Happy Fourth!
ojblass
+5  A: 

Haskell, 102 characters (wow, so close to matching the original):

import List
(take 10.map snd.sort.map(\(x:y)->(-length y,x)).group.sort.words)`fmap`readFile"input.txt"

J, only 55 characters:

10{.\:~~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'

(I've yet to figure out how to elegantly perform text manipulations in J... it's much better at array-structured data.)


   NB. read the file
   <1!:1<'input.txt'
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------...
|    Washington     Washington     Adams     Jefferson     Jefferson     Madison     Madison     Monroe     Monroe     John Quincy Adams     Jackson     Jackson     Van Buren     Harrison DIES     Tyler     Polk     Taylor DIES     Fillmore     Pierce     ...
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------...
   NB. split into lines
   <;._2[1!:1<'input.txt'
+--------------+--------------+---------+-------------+-------------+-----------+-----------+----------+----------+---------------------+-----------+-----------+-------------+-----------------+---------+--------+---------------+------------+----------+----...
|    Washington|    Washington|    Adams|    Jefferson|    Jefferson|    Madison|    Madison|    Monroe|    Monroe|    John Quincy Adams|    Jackson|    Jackson|    Van Buren|    Harrison DIES|    Tyler|    Polk|    Taylor DIES|    Fillmore|    Pierce|    ...
+--------------+--------------+---------+-------------+-------------+-----------+-----------+----------+----------+---------------------+-----------+-----------+-------------+-----------------+---------+--------+---------------+------------+----------+----...
   NB. split into words
   ;;:&.><;._2[1!:1<'input.txt'
+----------+----------+-----+---------+---------+-------+-------+------+------+----+------+-----+-------+-------+---+-----+--------+----+-----+----+------+----+--------+------+--------+-------+-------+----+-------+-----+-----+-----+--------+----+------+---...
|Washington|Washington|Adams|Jefferson|Jefferson|Madison|Madison|Monroe|Monroe|John|Quincy|Adams|Jackson|Jackson|Van|Buren|Harrison|DIES|Tyler|Polk|Taylor|DIES|Fillmore|Pierce|Buchanan|Lincoln|Lincoln|DIES|Johnson|Grant|Grant|Hayes|Garfield|DIES|Arthur|Cle...
+----------+----------+-----+---------+---------+-------+-------+------+------+----+------+-----+-------+-------+---+-----+--------+----+-----+----+------+----+--------+------+--------+-------+-------+----+-------+-----+-----+-----+--------+----+------+---...
   NB. count reptititions
   |:~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'
+----------+-----+---------+-------+------+----+------+-------+---+-----+--------+----+-----+----+------+--------+------+--------+-------+-------+-----+-----+--------+------+---------+--------+---------+----+------+-------+--------+------+---+------+------...
|2         |2    |2        |2      |2     |1   |1     |2      |1  |1    |2       |6   |1    |1   |1     |1       |1     |1       |2      |3      |2    |1    |1       |1     |2        |2       |2        |1   |2     |1      |1       |1     |4  |2     |2     ...
+----------+-----+---------+-------+------+----+------+-------+---+-----+--------+----+-----+----+------+--------+------+--------+-------+-------+-----+-----+--------+------+---------+--------+---------+----+------+-------+--------+------+---+------+------...
|Washington|Adams|Jefferson|Madison|Monroe|John|Quincy|Jackson|Van|Buren|Harrison|DIES|Tyler|Polk|Taylor|Fillmore|Pierce|Buchanan|Lincoln|Johnson|Grant|Hayes|Garfield|Arthur|Cleveland|McKinley|Roosevelt|Taft|Wilson|Harding|Coolidge|Hoover|FDR|Truman|Eisenh...
+----------+-----+---------+-------+------+----+------+-------+---+-----+--------+----+-----+----+------+--------+------+--------+-------+-------+-----+-----+--------+------+---------+--------+---------+----+------+-------+--------+------+---+------+------...
   NB. sort
   |:\:~~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'
+----+---+-------+----+------+----------+------+---------+------+-----+------+--------+-------+-------+---------+-------+--------+-----+----------+-------+---------+-----+---+-----+------+----+------+----+------+-----+-------+----+------+-----+-------+----...
|6   |4  |3      |3   |2     |2         |2     |2        |2     |2    |2     |2       |2      |2      |2        |2      |2       |2    |2         |2      |2        |2    |1  |1    |1     |1   |1     |1   |1     |1    |1      |1   |1     |1    |1      |1   ...
+----+---+-------+----+------+----------+------+---------+------+-----+------+--------+-------+-------+---------+-------+--------+-----+----------+-------+---------+-----+---+-----+------+----+------+----+------+-----+-------+----+------+-----+-------+----...
|DIES|FDR|Johnson|Bush|Wilson|Washington|Truman|Roosevelt|Reagan|Nixon|Monroe|McKinley|Madison|Lincoln|Jefferson|Jackson|Harrison|Grant|Eisenhower|Clinton|Cleveland|Adams|Van|Tyler|Taylor|Taft|Quincy|Polk|Pierce|Obama|Kennedy|John|Hoover|Hayes|Harding|Garf...
+----+---+-------+----+------+----------+------+---------+------+-----+------+--------+-------+-------+---------+-------+--------+-----+----------+-------+---------+-----+---+-----+------+----+------+----+------+-----+-------+----+------+-----+-------+----...
   NB. take 10
   10{.\:~~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'
+-+----------+
|6|DIES      |
+-+----------+
|4|FDR       |
+-+----------+
|3|Johnson   |
+-+----------+
|3|Bush      |
+-+----------+
|2|Wilson    |
+-+----------+
|2|Washington|
+-+----------+
|2|Truman    |
+-+----------+
|2|Roosevelt |
+-+----------+
|2|Reagan    |
+-+----------+
|2|Nixon     |
+-+----------+
ephemient
J == Klingon????? How can you possibly maintain that code?
ojblass
I suppose the most obvious problem is that the symbol stream is meaningless without knowing J's character set and vocabulary... but it's not too bad aside from that. Are there any languages that promote maintainable one-liners?
ephemient
well, my answer provides by far the most readable solution to this problem.
SilentGhost
For the reverse order that OP uses, replace 10{.\:~ with 10{:/:~
ephemient
+2  A: 

A revision on previous entry which should save 10 characters:

h = {}
File.open('f.1').each {|l|l.split(/ /).each{|e|h[e]==nil ?h[e]=1:h[e]+=1}}
h.sort{|a,b|a[1]<=>b[1]}.last(10).each{|e|puts"#{e[1]} #{e[0]}"}
Cuervo's Laugh
I had no idea how to preprocess the code in response - this is not an entry but a support for previous entrant's code.
Cuervo's Laugh
+2  A: 

Perl 86 characters

94, if you count the input filename.

perl -anE'$_{$_}++for@F;END{say"$_{$_} $_"for@{[sort{$_{$b}<=>$_{$a}}keys%_]}[0..10]}' test.in

If you don't care how many results you get, then it's only 75, excluding the filename.

perl -anE'$_{$_}++for@F;END{say"$_{$_} $_"for sort{$_{$b}<=>$_{$a}}keys%_}' test.in
Brad Gilbert
Ah, -E requires Perl 5.10.
ephemient
+2  A: 

Ruby 66B

puts (a=$<.read.split).uniq.map{|x|"#{a.count x} "+x}.sort.last 10
matyr
+2  A: 

Ruby

115 chars

w = File.read($*[0]).split
w.uniq.map{|x| [w.select{|y|x==y}.size,x]}.sort.last(10).each{|z| puts "#{z[1]} #{z[0]}"}
+2  A: 

Windows Batch File

This is obviously not the smallest solution, but I decided to post it anyway, just for fun. :) NB: the batch file uses a temporary file named $ for storing temporary results.

Original uncompressed version with comments:

@echo off
setlocal enableextensions enabledelayedexpansion

set infile=%1
set cnt=%2
set tmpfile=$
set knownwords=

rem Calculate word count
for /f "tokens=*" %%i in (%infile%) do (
  for %%w in (%%i) do (

    rem If the word hasn't already been processed, ...
    echo !knownwords! | findstr "\<%%w\>" > nul
    if errorlevel 1 (

      rem Count the number of the word's occurrences and save it to a temp file
      for /f %%n in ('findstr "\<%%w\>" %infile% ^| find /v "" /c') do (
        echo %%n^|%%w >> %tmpfile%
      )

      rem Then add the word to the known words list
      set knownwords=!knownwords! %%w
    )
  )
)

rem Print top 10 word count
for /f %%i in ('sort /r %tmpfile%') do (
  echo %%i
  set /a cnt-=1
  if !cnt!==0 goto end
)

:end
del %tmpfile%

Compressed & obfuscated version, 317 characters:

@echo off&setlocal enableextensions enabledelayedexpansion&set n=%2&set l=
for /f "tokens=*" %%i in (%1)do for %%w in (%%i)do echo !l!|findstr "\<%%w\>">nul||for /f %%n in ('findstr "\<%%w\>" %1^|find /v "" /c')do echo %%n^|%%w>>$&set l=!l! %%w
for /f %%i in ('sort /r $')do echo %%i&set /a n-=1&if !n!==0 del $&exit /b

This can be shortened to 258 characters if echo is already off and command extensions and delayed variable expansion are on:

set n=%2&set l=
for /f "tokens=*" %%i in (%1)do for %%w in (%%i)do echo !l!|findstr "\<%%w\>">nul||for /f %%n in ('findstr "\<%%w\>" %1^|find /v "" /c')do echo %%n^|%%w>>$&set l=!l! %%w
for /f %%i in ('sort /r $')do echo %%i&set /a n-=1&if !n!==0 del $&exit /b

Usage:

> filename.bat input.txt 10 & pause

Output:

6|DIES
4|FDR
3|Johnson
3|Bush
2|Wilson
2|Washington
2|Truman
2|Roosevelt
2|Reagan
2|Nixon
Helen
MY EYES !!!1 11
Tom
I went looking back at this post and I am kind of glad I did. You did a very nice job doing this in a really backward language.
ojblass