views:

281

answers:

10

So I've had a simple ucwords function for Perl which I've had a while, and wanted to expand it, this is what I've come up with, is this the way I should be building my functions to handle optional parameters?

Original:

sub ucwords{
    $str = @_[0];
    $str = lc($str);
    $str =~ s/\b(\w)/\u$1/g;
    return $str;
}

Extended:

sub ucwords{
    if(@_[0] ne undef){#make sure some argument was passed
        @overloads = (0,1,2,3);
        $str = @_[0];
        if(@_[1] eq undef || @_[1] eq 0){ #default is to lowercase all but first
            $str = lc($str);
            $str =~ s/\b(\w)/\u$1/g;
            return $str;
        }else{ #second parameters
            if(!grep $_ eq @_[1], @overloads){ die("No overload method of ucwords() takes ".@_[1]." as second parameter."); }
            if(@_[1] eq 1){ $str =~ s/\b(\w)/\u$1/g;} #first letter to upper, remaining case maintained
            if(@_[1] eq 2){ $str = lc($str); $str =~ s/(\w)\b/\u$1/g;} #last letter to upper, remaining to lower
            if(@_[1] eq 3){ $str =~ s/(\w)\b/\u$1/g;} #last letter to upper, remaining case maintained
            return $str;
        }
    }else{
        die("No overload method of ucwords() takes no arguments");
    }
}

Psy

+21  A: 

In one word: NO!

Let's look at:

sub ucwords{
    $str = @_[0];
    $str = lc($str);
    $str =~ s/\b(\w)/\u$1/g;
    return $str;
}

First and foremost, you are not using strict. Use it. It is for your own good.

Second, you are not using warnings. Use it. It is for your own good. For example, the first element of @_ should be referred to using $_[0] and not @_[0].

Third, you should get in the habit of reading the FAQ list occasionally before re-inventing the wheel again: See How do I capitalize all the words on one line?

If you think this is harsh, consider the fact that when called as:

print ucwords("FRED AND BARNEY'S LODGE"), "\n";

your code outputs

Fred And Barney'S Lodge

which is the example given in that question.

Further, having a function that does more than one thing, chooses what it does on the basis of mystery numbers and does none of those things right is not a good design strategy.

You should instead have multiple functions, named in a way that can be understood by a casual reader of your code, each of which does only one thing and does that right.

Finally, the extended version of your function (without saying anything about the wisdom of writing such a function) can be better written as:

# untested code follows

use Carp;

{
    my %modes = map {$_ => undef} 0 .. 3;
    sub ucwords{
        croak 'No arguments passed' unless @_;

        my ($str, $mode) = @_;
        $mode = 0 unless defined $mode;

        croak "Invalid mode: '$mode'" unless exists $modes{$mode};

        if ( $mode == 0 ) {
            $str = lc($str);
            $str =~ s/\b(\w)/\u$1/g;
        }
        elsif ( $mode == 1 ) {
            $str =~ s/\b(\w)/\u$1/g;        
        }
        elsif ( $mode == 2 ) {
            $str = lc($str); 
            $str =~ s/(\w)\b/\u$1/g;        
        }
        else {
            $str =~ s/(\w)\b/\u$1/g;
        }

        return $str;
    }
}

See also Why use if-else if in C++?

Sinan Ünür
I think you should have answered twice in this case. Once with your original answer and once with the current version. You would have earned two upvotes from me in that case.
innaM
I agree it should be another answer, but I don't know if I would have upvoted the first one, as just saying to use strict and warnings isn't really pointing out the other problems with the function. Either way, +1 for this one. : )
NateDSaint
It should go without saying... but as the person who maintains scripts in my job, I'm going to say it... If you were to use something like the extended function above - DOCUMENT it. Name your modes, if only in the comments.
Rini
Also, just in case no one reads the FAQ entry I referred to, use `Text::Autoformat` http://search.cpan.org/perldoc/Text::Autoformat for this and other formatting needs.
Sinan Ünür
@Robert P: I rolled back your change. `3` is a valid mode.
Sinan Ünür
Whups, misinterpreted what he was doing. Nevermind!
Robert P
+4  A: 

May be you will find Params::Validate usefull. It can be used to validate params by various rules. Here is how it may look in your case:

## somewhere is the start of the module
use Params::Validate qw(:all);

sub ucwords {
    ## this line helps to undestand which parameter should be passed to function
    my ($string, $algorithm_id) = @_; 

    ## make sure that 2 scalar parameters passed
    validate_pos(@_, {'type' => SCALAR}, {'type' => SCALAR});

    ## main code here
}
Ivan Nevostruev
+9  A: 

Don't use the $foo ne undef construct. Operators in Perl are what's known as "context sensitive". By using certain operators, you introduce certain contexts. ne, eq, lt, gt, le, ge are all "string" operators, treating the scalars on either side as strings, whereas ==, !=, <, >, <=, >= are numeric operators, treating the object on either side as a number.

However, if you're testing for undef, it really doesn't make sense that something undefined is a number or a string, so they have an operator just for that kind of test: defined

You can test if something is defined simply by doing

if (defined $foo) {
    # my cool logic on $foo here
}
Robert P
+5  A: 

This may just be my opinion, and your coding style is totally up to you, but personally I find a lot of value in assigning the arguments to variables right off the bat, and rather than wrapping the "business" portion of your subroutine in an if block, I'd have the function croak before that. For example:

use Carp;

sub ucwords {
    my $str = shift;
    defined($str) 
        or croak 'No overload method of ucwords() takes no arguments';
    #the rest of your logic
}
NateDSaint
That's a great point. I'd edit the original but that makes your comment seem pointless. I'll add an edit to it.
NateDSaint
@NateDSaint: I took the liberty of editing your post and deleting my comment. Feel free to rollback if you don't like the changes.
Sinan Ünür
Probably less misleading this way. Thanks!
NateDSaint
Agreed; I'm always suspicious of action-at-a-distance, so I always grab local copies of special variables e.g. $1, $_ immediately. Pretty much the only place I'd use $_ directly is in a one-line map or grep.
Ether
+3  A: 

die

die, like other perl builtins, does not need, and generally should not have parentheses. However, die has a big brother that most people use these days, called

croak

Do:

use Carp;

and then

croak "My error here!";

Croak works just like die, but generally adds more useful information to the error message than die does, such as the line that the error occurred on relative to the caller.

Robert P
+2  A: 

Array Indexing

Array access, like other things in Perl, is context sensitive. Think of the sigil that gets attached to the name as a 'reminder' to you about what you're trying to access or use at the moment. Whenever you see the $, that means you're trying to get a single scalar value. Whenever you see a @, that means you're accessing a list, and % of course means a key/value hash pair. So, when you access your array like this:

@_[1]

You're asking for a list, containing a single element. This feature lets you get multiple values from an array at once, but when only accessing one value, it cause problems in some contexts, such as assignment. So, when accessing a single array element, you want to always use the scalar context:

$_[1]
Robert P
+5  A: 

Perl's switch statement: given/when

Perl, as of 5.10 and above, has a fantastic switch statement built in, called [given]. This is roughly equivalent to the switch statement in C, but much more versatile. To enable this feature, you need to add a line at the top of your script:

use 5.010;

This enables all the perl 5.10 features, including switch (and say, which works like print but automatically adds a "\n" at the end.) You can use it like this:

my $foo = get_foo();
my $nothing = 0;
given($foo) {
    when (undef)  { say "got an undefined value!"; }
    when ([1,3,5,6,8]) { say "was 1, 3, 5, 6, or 8"; }
    when (/^abc/) { say "was a string starting with abc"; }
    when ($_ == 4) { say "It was 4!!!"; }
    when ($_ > 100) { say "Greater than 100"; }
    default { $nothing = 1; }
}

The variable passed to given automatically gets put into $_ inside the given code, allowing you to compare against it. So, in your case, it would look like this (fixing the @[] to $[] issue):

given ($_[1]) {
    when ($_ == 1) { $str =~ s/\b(\w)/\u$1/g }
    when ($_ == 2) { $str = lc($str); $str =~ s/(\w)\b/\u$1/g }
    when ($_ == 3) { $str =~ s/(\w)\b/\u$1/g; }
    default { croak "No overloaded method of ucwords() takes '$_'." }
}
Robert P
+3  A: 

@_ Unpacking

Generally, you always want to unpack @_ before you do any other processing in your subroutine. This makes it much, much clearer to users, other maintainers and yourself in the future about how to use your sub. By using @_ directly, it's very hard to figure out what needs to be passed, just from the arguments given. They don't have any meaningful names, making it even harder to determine their purpose, and you have magic constants everywhere - normally a bad thing overall!

Your best bet is to get the variables into meaningfully named scalars right away, before you do anything else.

For one argument subroutines, a common solution is to use shift. This pulls the first element of an array off, and returns it (kind of like the opposite of pop.) If not given an array, and you're in a subroutine, it pulls it from the @_ array. So, you can do

sub mysub {
    my $foo = shift;
}

for any one argument subroutine.

However, what if you have more? List context assignment, to the rescue! It's possible to assign many variables at once, using a list assignment. You can do

sub myothersub {
   my ($foo, $bar, $baz) = @_;
}

And $foo, $bar, and $baz will be assigned the value in the 0, 1, and 2 indexes of @_, respectively. Well, what happens if there isn't anything in the 0, 1 or 2 index? They still get assigned - they become undef! Then you can check for undef as mentioned elsewhere in this question.

Robert P
+1  A: 

I strongly dislike overly-smart functions. An overly-smart function is one whose behavior is totally changed by its parameters. Look at yours, they almost don't share any code except parameter handling. Anyway if I would do some similar this I would write something like this:

use Carp;

{
    my %ucwords = (
     0 => sub {
      my $str = lc( shift() );
      $str =~ s/\b(\w)/\u$1/g;
      return $str;
     },
     1 => sub {
      my $str = shift;
      $str =~ s/\b(\w)/\u$1/g;
      return $str;
     },
     2 => sub {
      $str = lc( shift() );
      $str =~ s/(\w)\b/\u$1/g;
      return $str;
     },
     3 => sub {
      my $str = shift;
      $str =~ s/(\w)\b/\u$1/g;
      return $str;
     }
    );

    sub ucwords {
     my ( $str, $mode ) = @_;
     croak "No overload method of ucwords() takes no arguments"
      unless defined $str;
     $mode = 0 unless defined $mode;
     my $code = $ucwords{$mode};
     croak "Invalid mode: '$mode'" unless defined $code;
     goto \&$code;
    }
}
Hynek -Pichi- Vychodil
+1  A: 

Something that's been hinted at but not directly addressed in other answers is the use of numeric modes, a convention alien to Perl held over from C. Quick, without looking at the code what does mode #3 do? Hell, looking at the code what does mode #3 do?

Perl has efficient and easy to use strings. Use them. Give your modes names that have something to do with what its doing. Something like... first, last, recase_first, recase_last. They don't have to be totally descriptive, lower_case_then_uc_last_letter would be too long to type, but enough give something for the human brain to hook onto and associate.

But really these are four subroutines. Mode flags are red flags, especially when most of your code winds up inside an if/else statement.

Schwern