views:

122

answers:

3

On Server Fault, How to list symbolic link chains? (not my question) talks about listing all the symbolic links and following them. To make this doable, let's consider a single directory at first.

I want to write a short utility that does this. It looks easy to put pairs from symbolic links into a hash and then process the hash.

But then I might have something like:

ls -l
total 0
lrwxrwxrwx 1 pjb pjb 1 2010-02-23 08:48 a -> b
lrwxrwxrwx 1 pjb pjb 1 2010-02-23 08:48 b -> c
lrwxrwxrwx 1 pjb pjb 1 2010-02-23 09:03 c -> a
lrwxrwxrwx 1 pjb pjb 1 2010-02-23 09:17 trap -> b
lrwxrwxrwx 1 pjb pjb 1 2010-02-23 09:17 x -> y
lrwxrwxrwx 1 pjb pjb 1 2010-02-23 09:17 y -> b

where it is obvious that a->b->c is a loop, and that trap points into a loop, but to know x points into a loop I need to follow a bit.

One hash representation is:

a => b
b => c
c => a
trap => b
x => y
y => b

But the reverse representation is better for marking loops to bad starting points, once I know what the loops are.

So here's some questions:

  • Is a hash the best structure to represent symbolic links?
  • what's the best way to separate the graph of the file system to tell the loopy components from the tree components to the twig with a loop type pieces?
  • Is there a better algorithm than manually searching for all the loops from all the starting points?
  • From a graph-theory perspective -- is this sort of thing in the CPAN already? If not, what are some good helper modules?
+2  A: 

Have a look at the CPAN module File::Spec::Link. The resolve method says that it traverses a link repeatedly to find the linked target.

The resolve method of the module has this to say:

resolve($link)
  Returns the non-link ultimately linked to by $link, by repeatedly calling linked. Returns undef if the link can not be resolved

I had used this module to find a target of symbolic link whose target was in turn a symlink and so on. But I am not sure if this detects the cyclic symbolic links.

sateesh
+7  A: 

There's a Graph module on CPAN that you might use as in the following:

#! /usr/bin/perl

use warnings;
use strict;

use Graph;

my $g = Graph->new;
my $dir = @ARGV ? shift : ".";

opendir my $dh, $dir or die "$0: opendir $dir: $!";
while (defined(my $name = readdir $dh)) {
  my $path = $dir . "/" . $name;

  if (-l $path) {
    my $dest = readlink $path;
    die "$0: readlink $path: $!" unless defined $dest;

    $g->add_edge($name => $dest);
  }
  else {
    $g->add_vertex($name);
  }
}

my @cycle = $g->find_a_cycle;
if (@cycle) {
  $" = ' -> '; #" # highlighting error
  print "$0: $dir: at least one cycle: @cycle\n";
}
else {
  print "$0: $dir: no cycles\n";
}

For example, in a directory similar in structure to the one in your question, the output is

$ ../has-cycle 
../has-cycle: .: at least one cycle: c -> a -> b
Greg Bacon
Thanks for posting this. I intend to look at Graph for some other needs that I have, and to brush up on this sort of stuff.
Paul
@Paul You're welcome! I'm glad you found it beneficial.
Greg Bacon
A: 

You need to store more than just the name of the link. Either grab the inode number (if your FS supports that) or some other unique aspect. If one doesn't exist, then consider creating your own, maybe by checksumming the name/create/last-modified date. Either way, you need some way to uniquely identify each link. I've seen some utilities that simply put a limit on the number of links (between 8 and 255) and declare anything that exceeds this limit a loop, but I always considered that as "taking the cheap way out". :)

TMN