views:

374

answers:

5

I'm really not loving Perl but I've got to use it for my current task.

Here is my problem... I have three strings, that make up elements of a full directory path (windows, but needs to work on *nix too). For example...

$root = "c:\\checkout\\omega";
$client = "abc\\mainline";
$build = "omega\\abc\\mainline\\host\\make";

I want to combine these to make a full path e.g.

"c:\\checkout\\omega\\abc\\mainline\\host\\make"

But there is overlap between the $build string and the $root and/or $client string. How can I combine these to get the full path and ignore the overlapping. $client can probably be ignored in this example but there are other cases where $build may overlap $client but not $root.

I can think of lots horrible messy ways to implement it but I assume (perhaps wrongly) that there is a simple, clean and maybe even elegant way of doing this since Perl is mainly about the text manipulation.

Some kind of string OR operation maybe. I foolishly tried...

($root . $client) | $build

But it is a bitwise operation and the result is junk!

A: 

Here's one way to do it:

  1. use single quotes to preserve the backslashes (if backslashes is what you want);

  2. my $fullpath = join "", $root, $client, $build;

join is the 'glue' between the strings - in this case, empty or nothing.

The above gives:

c:\checkout\omegaabc\mainlineomega\abc\mainline\host\make

so if you need backslashes between the strings, use join "\\" instead which will give:

c:\checkout\omega\abc\mainline\omega\abc\mainline\host\make

With strings in double quotes, the character after the \ will be escaped. In single quotes, the literal string is preserved.

Then you can easily convert the backslashes to (*nix) forward slashes, but that's another process. With Perl, always use strict; which will helpfully point out any potential glitches when you run it from the command line.

Dave Everitt
+2  A: 

The fact you have three paths is confusing me a little, but if you want to find overlap between two, you could use a back reference in a regular expression.

For example:

$root = "c:\\checkout\\omega";
$build = "omega\\abc\\mainline\\host\\make";    

# Concatenate Strings
$path = "$root\\$build";
print "original ",$path,"\n";

# Look for overlap using a backreference
$path =~ /^.*(.+)\1.*$/;
print "overlap ",$1,"\n";

# Do a substitution to remove the overlap
$path =~ s/^(.*)(.+)\2(.*)$/\1\2\3/;
print "new ",$path,"\n";

This will produce the following output:

original c:\checkout\omega\omega\abc\mainline\host\make
overlap omega\
new c:\checkout\omega\abc\mainline\host\make
Dave Webb
It's not pretty but it works :)
Pev
The regex for finding overlap would be better as /(.+)\1/. Regexes will match anywhere in the string by default, so starting them with (non-capturing) ^.* or ending with .*$ is just meaningless noise. Depending on your regex implementation, those can also slow things down substnatially by forcing backtracking, but I'm pretty sure the Perl implementation is optimized to avoid that performance hit.
Dave Sherohman
Come to think of it... You don't need to repeat the search in your final regex. Changing it to s/$1// will probably work fine - the duplicated piece is in $1 from the previous regex, so just replace that with nothing. As long as you don't include a /g modifier, it will replace only the first instance of the repeated text. Or do it all in one shot with s/(.+)\1/$1/. (Using \1 in the replace portion of s/// "is better written as $1", per "use warnings".)
Dave Sherohman
A: 

Not knowing if there are any rules to your structure (for eg. $client seems superfluous in your example?) but if there is then u can do things like this:

my $root  = 'c:\checkout\omega'; 
my $build = 'omega\abc\mainline\host\make';

# $root + $build minus first node
my $file = join '\\', $root, ( split /\\/, $build, 2 )[1];

/I3az/

draegtun
A: 

I can think of lots horrible messy ways to implement it but I assume (perhaps wrongly) that there is a simple, clean and maybe even elegant way of doing this

Yepp. Take a look at the Path::Class module. From your question it is not entirely clear what you are trying to do, but Class::Path let's you manipulate paths in a cross-platform way.

innaM
+4  A: 

This regex below is better suited for eliminating duplicate path sequences.

qr{ ( 
      [\\/]  # 1. starts with a path break
      .+?    # 2. whatever
    )
    \1       # whatever was captured in the previous group 
             # it forces us to backtrack on #2 until we have duplicates
             # it will necessarily have a path break at the beginning
  }x;

The regex provided by Dave Webb works as long as there are no repeated letters in the path. Just make the last node 'mmake' and it breaks.

I get:

original c:\checkout\omega\abc\mainline\omega\abc\mainline\host\mmake
overlap m
new c:\checkout\omega\abc\mainline\omega\abc\mainline\host\make

You want the repetition to be directory names, not characters.

Also a simple substitution is all that's needed. Chances are when you see ^.* or .*$ in a regex, it's not needed. And it isn't needed any more in this one.

In fact all of this can be done with:

$path =~ s/([\\\/]+.+?)\1/$1/;

Replace something and it's duplicate with that something.

File::Spec

By the way, File::Spec is the accepted way to concatenate directories in a platform-independent fashion:

my $path = File::Spec->catfile( $root, $client, $build );
$path =~ s/([\\\/]+.+?)\1/$1/;

I have a minor pet peeve with File::Spec, though. I like using / for directories. And perl works with / in a windows environment. As long as I stay in the confines of perl, I never have to separate paths with the escape character (in the C family of languages). File::Spec forces backslashes to be consistent with the windows platform.

However, if that's what you're looking for, that's probably more reason to use it.

Axeman
Using / doesn't force backslashes... You can use other punctuation as your regex delimiters. e.g., s|(/.+)\1|$1|, s{(/.+)\1}{$1}, etc.
Dave Sherohman
@Dave Sherohman: I clarified that `File::Spec` forces backslashes.
Axeman