views:

921

answers:

15

Last year (2009), the Google Code Jam featured an interesting problem as the first problem in Round 1B: Decision Tree

As the problem seemed tailored for Lisp-like languages, we spontaneously had an exciting codegolf here on SO, in which a few languages managed to solve the problem in fewer characters than any Lisp variety, using quite a number of different techniques.

This year's Round 1B Problem A (File Fix-it) also seems tailored for a particular family of languages, Unix shell scripts. So continuing the "1B-A tradition" would be appropriate. :p But which language will end up with the shortest code? Let us codegolf and see!

Problem description (adapted from official page):

You are given T test cases. Each test case contains N lines that list the full path of all directories currently existing on your computer. For example:

/home
/home/awesome
/home/awesome/wheeeeeee
/home/awesome/wheeeeeee/codegolfrocks
/home/thecakeisalie

Next, you are given M lines that list the full path of directories you would like to create. They are in the same format as the previous examples. You can create a directory using the mkdir command, but you can only do so if the parent directory already exists. For example, to create the directories /pyonpyon/fumufumu/yeahyeah and /pyonpyon/fumufumu/yeahyeahyeah, you would need to use mkdir four times:

mkdir /pyonpyon
mkdir /pyonpyon/fumufumu
mkdir /pyonpyon/fumufumu/yeahyeah
mkdir /pyonpyon/fumufumu/yeahyeahyeah

For each test case, return the number of times you have to call mkdir to create all the directories you would like to create.

Input

Input consists of a text file whose first line contains the integer T, the number of test cases. The rest of the file contains the test cases.

Each test case begins with a line containing the integers N and M, separated by a space.

The next N lines contain the path of each directory currently existing on your computer (not including the root directory /). This is a concatenation of one or more non-empty lowercase alphanumeric strings, each preceded by a single /.

The following M lines contain the path of each directory you would like to create.

Output

For each case, print one line containing Case #X: Y, where X is the case number and Y is the solution.

Limits

1 ≤ T ≤ 100.

0 ≤ N ≤ 100.

1 ≤ M ≤ 100.

Each path contains at most 100 characters.

Every path appears only once in the list of directories already on your computer, or in the list of desired directories. However, a path may appear on both lists, as in example case #3 below.

If a directory is in the list of directories already on your computer, its parent directory will also be listed, with the exception of the root directory /.

The input file is at most 100,000 bytes long.

Example

Larger sample test cases may be downloaded here.

Input:

3
0 2
/home/sparkle/pyon
/home/sparkle/cakes
1 3
/z
/z/y
/z/x
/y/y
2 1
/moo
/moo/wheeeee
/moo

Output:

Case #1: 4
Case #2: 4
Case #3: 0

Code Golf

Please post your shortest code in any language that solves this problem. Input and output may be handled via stdin and stdout or by other files of your choice. Please include a disclaimer if your code has the potential to modify or delete existing files when executed.

Winner will be the shortest solution (by byte count) in a language with an implementation existing prior to the start of Round 1B 2010. So while you are free to use a language you've just made up in order to submit a 0-byte solution, it won't count, and you'll probably get downvotes. ^_^

Standings

+2  A: 

Bash, 175 169 168 135 130 128

WARNING: Be sure to run in an empty directory, as this will wipe out its contents first thing per test.

read t
for((;x++<t;));do
rm -r *
read n m
for((i=n+m;i--;));do
read d
mkdir -p .$d
done
echo Case \#$x: $[`find|wc -l`-n-1]
done
JB
Man that's dangerous. Thanks for the warning!
KalEl
@JoeyA is that $[ syntax documented anywhere? Great addition to the golfer's toolbox, but I can't find it mentioned in my bash manpage.
JB
@JB: it is the deprecated version of $((...)). It will be removed at some point.
Dan Andreatta
+8  A: 

Golfscript, 74 69 65

Now on a single line!
This solution is 63 characters, but will not run in a reasonable amount of time for test cases with thousands of paths (over 8 minutes), so I have opted to not count it.

n%(~,{'Case #'\)@': '\(~1$+@[]@{\('/':!%@\{[n!++:!]|}/}*,@-n@}/

The faster 65 character solution:

n%(~,{'Case #'\)@': '\(~1$+@[]@{\('/':!%@\{[n!++:!]+}/.&}*,@-n@}/

Kudos to marcog for the algorithm!

Nabb
I actually have a test case this fails on (haven't checked others) :P -- I don't seem able to put code in comments, so it's a string..."1\n2 3\n/home/home/home\n/home/home/sparkle\n/home/sparkle/home\n/home/home/home/sparkle\n/home/sparkle/sparkle\n"Answer should be 4.
jessicah
@jessicah That is invalid input according to the question - "If a directory is in the list of directories already on your computer, its parent directory will also be listed, with the exception of the root directory /". If `/home/home` is listed as an existing directory, then `/home` must be listed as well.
Anurag
sorries, yes, i forgot about that =/
jessicah
+7  A: 

Python, 193 175 173 171 166 165 164 156 151 149 147 146 145

r=raw_input;c=0;r()
while 1:N,M=map(int,r().split());d={};c+=1;exec"e=r()\nwhile e:d[e]=e,_=e.rsplit('/',1)\n"*(N+M);print'Case #%d:'%c,len(d)-N

This solution throws an EOFError, but since this is output to stderr it is within the rules. Since the output we're interested in all goes to stdout, we can pipe that and ignore stderr.

You might read the above (or some of the other posts) and think it shouldn't work. There's a bit of a trick to why, and I will explain this here. If you read the question carefully, it tells us that for each directory in the first list, all it's parent directories are also included in the list (e.g. if /home/marcog is present, so is /home) [1]. The question also guarantees us each list will have no duplicates [2]. This lets us throw all directories in the first list into the same set into which we throw the directories from the second list. Since the question guarantees no duplicates per list, we know that the first list will result in exactly N entries in the set. Therefore we can get the final answer by subtracting N from the size of the final set.

[1] "The next N lines each give the path of one directory that already exists on your computer. This list will include every directory already on your computer other than the root directory."

[2] "No path will appear twice in the list of directories already on your computer, or in the list of directories you wish to create."

marcog
`e.rpartition('/')` can be replaced by `e.rsplit('/',1)`
J.F. Sebastian
@jf sorry, looking at revision history I trampled on your edits. I've put it back, thanks!
marcog
Marco, I don't know enough Perl to be sure, but it appears that this solution assumes that N is the same as the number of existing directories. That happened to be the case for the first three test cases above, so I've added a test case above which breaks that. I think it breaks this solution as well.
CPerkins
@CPerkins Your test case is invalid. Read note [1] in my answer.
marcog
Marco, I don't follow. I don't see why that comes close to invalidating the test case I gave.
CPerkins
@CPerkins read http://groups.google.com/group/google-code/browse_thread/thread/ed370c7c4e156813
marcog
Holy kamolikers. You're right, and I'm wrong. Thanks for the correction. This quite shocking to me (not being wrong, that's happened before). But I got this problem right in the codejam, for both small and large datasets, so I figured I understood the requirements. But I managed to solve it while misunderstanding part of the criteria.
CPerkins
@CPerkins, You can safely ignore the constraint and just take the number of directories after the first N inserts, which fortunately for this golf happens to be N. I also only fount out afterwards :)
JPvdMerwe
+2  A: 

C#, 489 442 400 398

Just for fun, no chance of ever being very short of course. Count is after removing insignificant white space, which I left in here for readability.

namespace System
{
    class P
    {
        static void Main(string[]a)
        {
            int c = 0, n, m, d, l = 1;
            var f = IO.File.ReadAllLines(a[0]);

            while (c < int.Parse(f[0]))
            {
                var o = f[l++].Split(' ');
                n = int.Parse(o[0]);
                m = int.Parse(o[1]);
                var p = new Collections.Generic.HashSet<string>();

                while (n-- > 0)
                    p.Add(f[l++]);

                while (m-- > 0)
                    for (o = f[l++].Split('/'), d = 0; d++ < o.Length; )
                        if (p.Add(string.Join("/", o, 0, d)))
                            n++;

                Console.Write("Case #{0}: {1}\n", ++c, n);
            }
        }
    }
}

This latest version has a quirk. It mistakenly counts the root directory as having to be created once, to compensate for variable n being -1 at the start of the loop, instead of the desired 0. It works because the root directory is guaranteed not to be in N.

Thorarin
This yields 4,5,1 for the example test case above (just like my own solution to GCJ did). I still wonder why because my solution was correct for GCJ.
Joey
A: 

Java, 284

import java.util.*;class F{static{Scanner s=new Scanner(System.in);int
t=s.nextInt(),i=0,n,j;while(i++<t){Set x=new
HashSet();n=s.nextInt();j=s.nextInt();while(j-->-n){String b="";for(String
c:s.next().split("/"))x.add(b+=c+'/');}System.out.println("Case #"+i+": "+(x.size()-n-1));}}}

Note: it outputs an error message but still works correctly.

aditsu
Credits to marcog for algo and further shortening
aditsu
+2  A: 

PostScript

182 212 247 262 278 bytes

1[([){exch}/C{concatstrings}/R{(%lineedit)run}>>begin 1
R{(: )[(Case #)3{=only}repeat<<>>R 1 index add{()<<R]{99 string
cvs C<C>C dup 3 index[<>put}forall pop}repeat[length[sub =}for

Usage: $ gs -q -dNODISPLAY -dNOPROMPT thisfile.ps <input >output

KirarinSnow
+3  A: 

Lua solution, 172 bytes:

r,n=io.read,"*n"for i=1,r(n)do a,c,s=r(n),0,{}for i=1,a+r()do k=""for x in r():gmatch"[^/]+"do k=k.."/"..x c=c+(s[k]or 1);s[k]=0 end end print("Case #"..i..": "..(c-a))end
Kristofer
More golfing: remove variable `n` and call `r"*n"` instead (4 chars), change assignment from `a,c,s=r"*n",0,{}` to `a=r"*n"c=0 s={}` (1 char), remove semicolon (1 char)
gwell
A: 

Haskell: 299

import Data.List
import Text.Printf
a[]=[]
a(x:z)=(r-n):next where{;[n,m]=map read$words x;t=n+m;r=length$nub$concatMap(f.reverse)$take t z;next=a$drop t z;f""=[];f y=y:f z where(a,b:z)=span(/='/')y}
main=do{z<-getContents;putStr$unlines[printf"Case #%d: %d"x v|(x::Int,v)<-zip[1..]$a$tail$lines z]}

This requires GHC's -XScopedTypeVariables switch.

Readable version:

import Data.List
import Text.Printf

a [] = []
a (x:xs) = (r-n) : next where
    [n,m] = map read $ words x
    t = n+m
    r = length $ nub $ concatMap (f . reverse) $ take t xs
    next = a $ drop t xs
    f "" = []
    f y = y : f bs where
        (a,b:bs) = span (/= '/') y

cleanUp a = unlines [printf "Case #%d: %d" x v | (x::Int,v::Int) <- zip [1..] a]

main = do
    z<-getContents
    putStr$cleanUp$a$tail$lines z
Joey Adams
+2  A: 

Ruby, 151 149 144

Direct translation to Ruby of marcog's Python solution:

gets.to_i.times{|i|n,m=gets.split.map &:to_i
d={}
(n+m).times{gets.strip.split('/').inject{|a,p|d[a+='/'+p]=a}}
puts"Case ##{i+1}: #{d.size-n}"}
J.F. Sebastian
A: 

PyonScript

158 159 bytes

1({[([){exch}/R{(%lineedit)run}>>begin R{(: )[(Case #)3{=only}repeat<<>>R 1
index +{<><<R]{str cat<C>cat dup 3 index[<>put}forall pop}repeat[length[-
=}for}1)

Usage: $ pyon thisfile.pyon <input >output

Based on the PostScript solution.

Since PyonScript development is still in progress, this code works on the implementation as it existed at the start of Round 1B 2010: http://github.com/KirarinSnow/PyonScript

KirarinSnow
+2  A: 

Haskell, 218

import Data.List
main=interact$(!1).tail.lines
(l:j)!i=let{[n,m]=map read$words l;(p,k)=splitAt(n+m)j}in"Case #"++show i++": "++show(length(nub$(>>=s)p)-n)++"\n"++k!(i+1)
s(_:p)=map(flip take p)$elemIndices '/'(p++"/")

Similar (but way shorter :P) to the other Haskell solution.

Ends up in error, and that's expected. Whether or not that follows the rules is a bit more prone to debate than for other solutions. The output and error streams are indeed not mixed up. But if stdout is buffered, the results are never sent. That's ok for interactive use (then just copy-paste console output to a file), but it mostly rules out redirection. To make it short, ./filefixit <input 2>/dev/null does the trick.

The problem can be avoided by inserting line noise in line 3: []!_="" (8 more bytes, 226 in total)

For clarity, the exact same semantics with full indentation and meaningful identifiers:

import Data.List
main = interact $ testsStartingAt 1 . tail . lines
testsStartingAt _   []   = "" -- this line optional
testsStartingAt i (l:ls) = testOutput ++ testsStartingAt (i+1) ls'
    where [n,m]       = map read $ words l
          (paths,ls') = splitAt (n+m) ls
          testResult  = length (nub $ concatMap splitPath paths) - n
          testOutput  = "Case #" ++ show i ++ ": " ++ show testResult ++ "\n"
splitPath (_:p) = map (flip take p) $ elemIndices '/' (p ++ "/")
JB
+3  A: 

Perl, 111 106 100

perl -naE 'BEGIN{<>}++$i;($n,$m,%d)=@F;for(1..$n+$m){$_=<>;$d{$`}++while/\w\K\b/g}say"Case #$i: ",keys(%d)-$n'

As ever with golf programs dependent on interpreter command-line options, the options' bytecount difference with respect to the default has been added to the actual code length.

Indented, commented

#! perl -na      # activate line mode and autosplit
BEGIN { <> }     # skip test case count

# now for each test case:

++$i;            # increment test counter
($n,$m,%d) = @F; # read N and M;
                 # clear out directory hash

for (1..$n+$m) { # for each expected pathname:
  $_ = <>;          # read it
  $d{$`}++          # add to hash...
    while /\w\K\b/g # ...all branches from root
}

say "Case #$i: ", keys(%d) - $n

Most of the magic is the branch-from-root extraction. The trick is to uses a regex to locate the interesting cutting points (namely: before each slash, and the end of the string), but to extract the string using Perl's $PREMATCH (dollar-backtick; markdown won't let me include that properly) instead of the usual pattern-matching facilities.

\b locates a word boundary, which resolves to all words' (directories') start and end. We only want their end, so we prepend a \w. But that would cut the last character from the path, which is a problem if we get both /foo/bar and /foo/baz in the same dataset. So we exclude said character from the match with \K. None of this would be necessary if Perl had a \>-like (Emacs regexes) metacharacter.

As a stand-alone program (106)

for$i(1..<>){($n,$m,%d)=split$",<>;
for(1..$n+$m){$_=<>;$d{$`}++while/\w\K\b/g}
say"Case #$i: ",keys(%d)-$n}

Newlines for legibility; none of them is necessary or counted in.

It uses features found only in the latest versions of Perl, so run with perl -M5.010 or later.

JB
A: 

AWK - 123 121 chars

Another adaptation of marcog python version. Run with awk -F'[ \]' -f fixit.awk

function p(){if(c++>1)print"Case #"c-2": "length(k)-n}
/\//{for(y=i=1;i<NF;)k[y=y"/"$++i];next}
{p();n=$1;delete k}
END{p()}

To run it without the -F option, it grows by 15 chars, since it then needs this part:

BEGIN{FS=" |/"}
Dan Andreatta
Couldn't your FS setting be included in the third line, for just 9 characters? Also, since you're not using the default one, I'm of the opinion any supplementary characters to `awk -f fixit.awk` should be counted in.
JB
A: 

Scala, 190 189

Yet another version of marcog's solution, this time in Scala. Runs with scala, but would need to be put into a class to allow compilation with scalac.

for(c←1 to readInt){val I=readLine split" "map(_.toInt)
var d=Set("")
for(i←1-I(0)to I(1)){var a="";for(w←readLine split"/"){a+="/"+w;d+=a}}
printf("Case #%d: %d\n",c,d.size-I(0)-2)}
gmanuell
A: 

J - 205 159 bytes

n=:_1".{.
l=:3 :0
('Case #: ',":((#@:(~.@:,/@:([\"2)@:([;.1"1)@:(i.@:+/@:n{}.)))-{.@:n)y) 1!:2<2
y=.((+i.@:((#y)&-))1&+(+/n y)){y
)
l^:(n d)}.d=:[;._2(1!:1<3)

run:

script.ijs < gcj-input

Still, could be a lot better :/

Eelvex
Seems like waaay too much whitespace and parentheses for J... :D
KirarinSnow
Very true... I suppose I was too tired :P:)Thanks.
Eelvex