views:

3775

answers:

24

One of my colleagues recently interviewed some candidates for a job and one said they had very good Perl experience.

Since my colleague didn't know Perl, he asked me for a critique of some code written (off-site) by that potential hire, so I had a look and told him my concerns (the main one was that it originally had no comments and it's not like we gave them enough time).

However, the code works so I'm loathe to say no-go without some more input. Another concern is that this code basically looks exactly how I'd code it in C. It's been a while since I did Perl (and I didn't do a lot, I'm more a Python bod for quick scripts) but I seem to recall that it was a much more expressive language than what this guy used.

I'm looking for input from real Perl coders, and suggestions for how it could be improved (and why a Perl coder should know that method of improvement).

You can also wax lyrical about whether people who write one language in a totally different language should (or shouldn't be hired). I'm interested in your arguments but this question is primarily for a critique of the code.

The spec was to successfully process a CSV file as follows and output the individual fields:

User ID,Name , Level,Numeric ID
pax, Pax Morgan ,admin,0
gt,"  Turner, George" rubbish,user,1
ms,"Mark \"X-Men\" Spencer","guest user",2
ab,, "user","3"

The output was to be something like this (the potential hire's code actually output this):

User ID,Name , Level,Numeric ID:
   [User ID]
   [Name]
   [Level]
   [Numeric ID]
pax, Pax Morgan ,admin,0:
   [pax]
   [Pax Morgan]
   [admin]
   [0]
gt,"  Turner, George  " rubbish,user,1:
   [gt]
   [  Turner, George  ]
   [user]
   [1]
ms,"Mark \"X-Men\" Spencer","guest user",2:
   [ms]
   [Mark "X-Men" Spencer]
   [guest user]
   [2]
ab,, "user","3":
   [ab]
   []
   [user]
   [3]

Here is the code they submitted:

#!/usr/bin/perl

# Open file.

open (IN, "qq.in") || die "Cannot open qq.in";

# Process every line.

while (<IN>) {
    chomp;
    $line = $_;
    print "$line:\n";

    # Process every field in line.

    while ($line ne "") {
        # Skip spaces and start with empty field.

        if (substr ($line,0,1) eq " ") {
            $line = substr ($line,1);
            next;
        }

        $field = "";
        $minlen = 0;

        # Detect quoted field or otherwise.

        if (substr ($line,0,1) eq "\"") {
            $line = substr ($line,1);
            $pastquote = 0;
            while ($line ne "") {
                # Special handling for quotes (\\ and \").

                if (length ($line) >= 2) {
                    if (substr ($line,0,2) eq "\\\"") {
                        $field = $field . "\"";
                        $line = substr ($line,2);
                        next;
                    }
                    if (substr ($line,0,2) eq "\\\\") {
                        $field = $field . "\\";
                        $line = substr ($line,2);
                        next;
                    }
                }

                # Detect closing quote.

                if (($pastquote == 0) && (substr ($line,0,1) eq "\"")) {
                    $pastquote = 1;
                    $line = substr ($line,1);
                    $minlen = length ($field);
                    next;
                }

                # Only worry about comma if past closing quote.

                if (($pastquote == 1) && (substr ($line,0,1) eq ",")) {
                    $line = substr ($line,1);
                    last;
                }
                $field = $field . substr ($line,0,1);
                $line = substr ($line,1);
            }
        } else {
            while ($line ne "") {
                if (substr ($line,0,1) eq ",") {
                    $line = substr ($line,1);
                    last;
                }
                if ($pastquote == 0) {
                    $field = $field . substr ($line,0,1);
                }
                $line = substr ($line,1);
            }
        }

        # Strip trailing space.

        while ($field ne "") {
            if (length ($field) == $minlen) {
                last;
            }
            if (substr ($field,length ($field)-1,1) eq " ") {
                $field = substr ($field,0, length ($field)-1);
                next;
            }
            last;
        }

        print "   [$field]\n";
    }
}
close (IN);
+9  A: 

Does it work? Did he write it in an acceptable period of time? Do you think it's maintainable?

If you can answer me these questions three, they you may pass the bridge of death (*).

thijs
Laden or unladen swallow? ... Yes, yes and borderline.
paxdiablo
aaaaaaaaaaaaaarghh.. (you just fell into the ravine)
thijs
Python fans (Monty, not Guido, style) should never be allowed to communicate with each other - it always devolves into quoting entire movies :-)
paxdiablo
+8  A: 

I think the biggest problem is that he or she didn't show any knowledge of regex. And that is key in Perl.

The question is, can they learn? There is so much to look for in a candidate past this piece of code.

Artem Russakovskii
It's actually down to three people, all who met the core requirements (including keenness and ability to learn). We're just looking for differentiators between them now. That's a good point about regex.
paxdiablo
+8  A: 

I'd say his code is an adequate solution. It works, doesn't it? And there's an advantage to maintainability by writing "longhand" instead of in as few characters of code as you can.

The motto of Perl is "There's More Than One Way To Do It." Perl doesn't really get on your case about coding style, as some languages do (I like Python too, but you've got to admit that people can get kind of snobbish when evaluating whether code is "pythonic" or not).

Bill Karwin
there is a difference between having maintainable code and having code that spells out every little bit. If you are maintaining perl code, you should know perl enough not to need it written out like that. I'd have a hard time maintaining it as it is, because, well, it doesn't look like normal perl.
Ape-inago
+34  A: 

It isn't dreadfully idiomatic Perl, but it isn't completely dreadful Perl either (though it could be much more compact).

Two warning bells - the shebang line doesn't include '-w' and there is neither 'use strict;' nor 'use warnings;'. This is very old-style Perl; good Perl code uses both warnings and strict.

The use of old-style file handles is no longer recommended, but it isn't automatically bad (it could be code written more than 10 years ago, perhaps).

The non-use of regular expressions is a bit more surprising. For example:

# Process every field in line.
while ($line ne "") {
    # Skip spaces and start with empty field.

    if (substr ($line,0,1) eq " ") {
        $line = substr ($line,1);
        next;
    }

That could be written:

while ($line ne "") {
    $line =~ s/^\s+//;

This chops off all leading spaces using a regex, without making the code iterate around the loop. A good deal of the rest of the code would benefit from carefully written regular expressions too. These are a characteristically Perl idiom; it is surprising to see that they are not being used.

If efficiency was the proclaimed concern (reason for not using regexes), then the questions should be "did you measure it" and "what sort of efficiency are you discussing - machine, or programmer"?

Working code counts. More or less idiomatic code is better.

Also, of course, there are modules Text::CSV and Text::CSV_XS that could be used to handle CSV parsing. It would be interesting to enquire whether they are aware of Perl modules.


There are also multiple notations for handling quotes within quoted fields. The code appears to assume that backslash-quote is appropriate; I believe Excel uses doubled up quotes:

"He said, ""Don't do it"", but they didn't listen"

This could be matched by:

$line =~ /^"([^"]|"")*"/;

With a bit of care, you could capture just the text between the enclosing quotes. You'd still have to post-process the captured text to remove the embedded doubled up quotes.

A non-quoted field would be matched by:

$line =~ /^([^,]*)(?:,|$)/;

This is enormously shorter than the looping and substringing shown.


Here's a version of the code, using the backslash-double quote escape mechanism used in the code in the question, that does the same job.

#!/usr/bin/perl -w

use strict;

open (IN, "qq.in") || die "Cannot open qq.in";

while (my $line = <IN>) {
    chomp $line;
    print "$line\n";

    while ($line ne "") {
        $line =~ s/^\s+//;
        my $field = "";
        if ($line =~ m/^"((?:[^"]|\\.)*)"([^,]*)(?:,|$)/) {
            # Quoted field
            $field = "$1$2";
            $line = substr($line, length($field)+2);
            $field =~ s/""/"/g;
        }
        elsif ($line =~ m/^([^,]*)(?:,|$)/) {
            # Unquoted field
            $field = "$1";
            $line = substr($line, length($field));
        }
        else {
            print "WTF?? ($line)\n";
        }
        $line =~ s/^,//;
        print "   [$field]\n";
    }
}
close (IN);

It's under 30 non-blank, non-comment lines, compared with about 70 in the original. The original version is bigger than it needs to be by some margin. And I've not gone out of my way to reduce this code to the minimum possible.

Jonathan Leffler
Well, it's under 30 now, but when you have to go back to add /x after the team review it will balloon again. Then, after /x makes it look messy, you move the regexes out of the way by putting them into scalars with qr//, you add a bit more. But, maybe you get some of that back when you use \G so you don't have to modify $line, but then nobody remembers how \G works. :)
brian d foy
I'd be worried if someone made a 30-character regex extend over 30 lines with /x -- I'm sure it could be done, but it wouldn't be more readable (not at that extreme). But I agree - the compactness is a variable quantity (or quality).
Jonathan Leffler
now make it work on CSV with line feeds in quoted fields, go on, dare you to. ;). And it has to work with 2G csv files.
Kent Fredric
Not using regexes - at that point, it becomes a task for one of the CSV modules.
Jonathan Leffler
+32  A: 

I would argue writing C in Perl is a much better situation than writing Perl in C. As is often brought up on the SO podcast, understanding C is a virtue that not all developers (even some good ones) have nowadays. Hire them and buy a copy of Perl Best Practices for them and you will be set. After best practices a copy of Intermediate Perl and they could work out.

Copas
I wish i could +2 or something. this is sound advice.
Ape-inago
I agree that C is a part of a good foundation for any developer.
Akers
+3  A: 

Maybe ask him to write more versions of the same code? When in doubt about hiring, ask more questions to candidate.

sbidwai
+12  A: 

This is a case where you need to follow up with the programmer. Ask him why he wrote it this way.

There may be a very good reason.. perhaps this needed to follow the same behavior as existing code and therefore he did a line by line translation on purpose for full compatability. If so, give him points for a decent explaination.

Or perhaps he doesn't know Perl, so he learned it that afternoon to answer the question. If so, give him points for fast and nimble learning skills.

The only disqualifying comment may be "I always program Perl this way. I don't understand that regexp stuff."

SPWorley
+2  A: 

The fact that he hasn't used a single piece of regex in the code should make you ask him a lot of questions about why he did write like that.

Maybe he's Jamie Zawinski or a fan and he didn't want to have more problems?

I'm not necessarily saying that the whole parsing should be a huge amount of unreadable CSV parsing regex like ("([^"]*|"{2})*"(,|$))|"[^"]*"(,|$)|[^,]+(,|$)|(,) or one of the many similar regex around, but at least to traverse the lines or instead of using substring().

Vinko Vrsalovic
+121  A: 

I advise people to never hire Perl programmers, or C programmers, or Java programmers, and so on. Just hire good people. The programmers who I've hired to write Perl were also skilled in various other languages. I hired them because they were good programmers, and good programmers can deal with multiple languages.

Now, that code does look a lot like C, but I think it's fine Perl too. If you're hiring a good programmer, with a little Perl practice under his belt he'll catch up just fine. People are complaining about the lack of regexes, which would make things simpler in ancillary areas, but I wouldn't wish on anyone a regex solution on parsing that dirty CSV data. I wouldn't want to read it or maintain it.

I often find that the reverse problem is more troublesome: hire a good programmer who writes good Perl code, but the rest of the team only knows the basics of Perl and can't keep up. This has nothing to do with poor formatting or bad structure, just a level of skill with advanced topics (e.g. closures).


Things are getting a bit heated in this debate, so I think I should explain more about how I deal with this sort of thing. I don't see this as a regex / no-regex problem. I wouldn't have written the code the way the candidate did, but that doesn't really matter.

I write quite a bit of crappy code too. On the first pass, I'm usually thinking more about structure and process than syntax. I go back later to tighten that up. That doesn't mean that the candidate's code is any good, but for a first pass done in an interview I don't judge it too harshly. I don't know how much time he had to write it and so on, so I don't judge it based on something I would have had a long time to work on. Interview questions are always weird because you can't do what you'd really do for real work. I'd probably fail a question about writing a CSV parser too if I had to start from scratch and do it in 15 minutes. Indeed, I wasted more than that today being a total bonehead with some code.

I went to look at the code for Text::CSV_PP, the Pure Perl cousin to Text::CSV_XS. It uses regular expressions, but a lot of regular expressions that handle special cases, and in structure isn't that different from the code presented here. It's a lot of code, and it's complicated code that I hope I never have to look at again.

What I tend to disfavor are interview answers that only address the given input. That's almost always the wrong thing to do in the real world where you have to handle cases that you may not have discovered yet and you need the flexibility to deal with future issues. I find that missing from a lot of answers on Stackoverflow too. The thought process of the solution is more telling to me. People become skilled at a language more easily than they change how they think about things. I can teach people how to write better Perl, but I can't change their wetware for the most part. That comes from scars and experience.

Since I wasn't there to see the candidate code the solution or ask him follow-up questions, I won't speculate on why he wrote it the way he did. For some of the other solutions I've seen here, I could be equally harsh in an interview.

A career is a journey. I don't expect everyone to be a guru or to have the same experiences. If I write-off people because they don't know some trick or idiom, I'm not giving them the chance to continue their journey. The candidate's code won't win any prizes, but apparently it was enough to get him into the final three for consideration for an offer. The guy got up there and tried, did much better than a lot of code I've seen in my life, and that's good enough for me.

brian d foy
Your last point contradicts your initial argument. If you have a good Perl programmer and you hire good programmers to your team, they should have no problems becoming good Perl programmers, and having a rock star one to lead them can only produce faster and better results.
Artem Russakovskii
My last point contradicts nothing. A rock star doesn't mean they are effective communicators or mentors, or that they aren't assholes to people who they think are inferior. A good programmer isn't the same thing as a good team member.
brian d foy
Brian: there's a problem here: namely, your opinion that the mess above is easier to maintain than a clear regular expression – just because it's ostensibly clean code. Clean code isn't everything: the above is 100% pure visual clutter that hides the semantics in a lot of syntax noise.
Konrad Rudolph
True CSV almost certainly can't be parsed by a regex that anyone would ever be able to understand after writing it. Although it is tempting to believe that every problem solved with Perl needs a big regex, there really is more than one way to do it. brian d foy (note no caps) is right on point here.
RBerteig
I agree with Konrad - imho a regex, even a highly complex one, is an outright better solution to this, the readability problem (if it is indeed a problem, which I doubt: I find the if/else logic much more jarring) is solved by a comment block explaining it.
annakata
If someone as respected as brian is within the Perl community says that that's decent Perl code I'm inclined to agree. With that said, I wouldn't rate the developer as "very good." While there's nothing wrong with the code (particularly the types of things novices tend to do) it isn't idiomatic and it contains some outdated constructs. The applicant is competent at Perl but clearly not an advanced user.
Michael Carman
RBerteig: sorry but I completely disagree. CSV formats, even complicated ones, are still very easy to parse, almost naturally using regular expressions. Also, regular expressions in Perl can span multiple lines, contain formatting and comments. All in all, this allows for concise, efficient and well-maintainable code.
Konrad Rudolph
Konrad, you're just wrong. Properly parsing CSV requires keeping track of the parse state as you go (am I in a quoted field, did I just see an escape). That sort of stuff makes for messy or (near-)impossible regexes. I say near-impossible because Perl does in fact have features for maintaining state inside regexes.Nonetheless, the proper way to parse CSV in Perl is to do it a character at a time. That said, I've asked similar questions in an interview, and by far the best answer is "I would use Text::CSV(_XS) from CPAN."
Dave Rolsky
Dave: sorry, it's the other way round: you are wrong. Parsing quoted strings with escaped quote marks in regex is trivial, no extra state has to be maintained: /"(\\"|[^"])*"/ (= either we've got an escaped quote, or any character except a quote; multiple times). It really is that easy. By the way, nothing against using a module here. But someone who doesn't even use Perl features for stripping the input off whitespace is clearly nowhere near professional in Perl.
Konrad Rudolph
Everything in that script is a Perl feature.
brian d foy
@Konrad, it sounds simple, but if you've ever written one, you'll see quickly regex become impossibly complicated, especially when you discover quoted fields can have line feeds in them. For instance, the OP's parser will have problems with these because it works linewise, and will consider \n an end of a line. So you cant process it linewise, so you have 2 options: a regex on a possibly huge string representing the whole file, or a state machine.
Kent Fredric
Kent: aside from the fact that the Perl code in question also doesn't represent a state machine: there's a 1:1 correlation between regex and state machines. In general, there's no reason to prefer the latter to the former. Additionally, I don't even say that everything should be written in one huge regex, only that for the current case it's rather trivial. I really want to stress this, it's not complicated at all. And certainly adapting a manually implemented state machine isn't any easier than adapting a regex.
Konrad Rudolph
Konrad: since regexes are so easy to use to parse CSV, why not just post one?
jrockway
This answer is a non sequitur: The OP didn't ask a philosophical question about who can blossom into being a great Perl programmer, the OP was talking about a hiring decision for a current Perl programmer. This code is not the code of someone who is "very good" at Perl. Granted, it does show a basic understanding of Perl's substr function; it does not demonstrate understanding of modularization, data structures or even Perl idiom. For code written "off site" it shows only the bare minimum of producing an answer and thus should be a NO HIRE for someone who claims "real" experience.
jaredor
@jaredor: even if the person claims "real experience", they might just not have the actual experience to judge their experience. That doesn't mean that they can't quickly learn or jump to the next level. The NO HIRE police seem to be motivated more on revenge at some past wrong than anything else. My answer, although not directly aimed at the asked question, was apparently the answer needed. Notice that it was accepted. I try to answer the questions people have, not the questions they ask. The need to ask the question means the people don't have the grasp of the situation to ask the right one.
brian d foy
@'brian d foy' Notice that "Accepted" is not a synonym for "Correct"; per the pop-up that appeared when I downvoted this "best answer", it can still benefit from criticism. I do think you've been given too much of a pass on how relevant your observations are to this hiring situation. In general, I mostly agree with you on hiring programmers, but the OP was talking about input NOW for hiring NOW. In particular, I think that this answer didn't honor the technical reviewer's instincts, which were clearly that something was amiss. Reframing the question was not a service in this instance.
jaredor
@jaredor: think what you like, but it's still the answer the OP liked the most. Maybe you didn't like it, and that's fine. You aren't everyone, you're not the OP, and you aren't most people who voted on this. Your minority opinion is noted and filed.
brian d foy
+5  A: 

I wouldn't accept the candidate. He or she isn't comfortable with Perl's idioms, which will result in suboptimal code, less work efficieny (all those unnecessary lines have to be written!) and a inablilty to read code written by an experienced Perl coder (who of course uses regexes etc. at large).

But it works...

ammoQ
Those are several unwarranted assumptions based on such little evidence.
brian d foy
brian: If you hire someone, you must base your judgement on whatever information you have. I might be wrong, but it's better to dismiss a good programmer than to hire a bad one.
ammoQ
+1 for your "But it works..." link. So true.
jrockway
@ammoQ: you don't have to hire someone with limited information. That's why you should have good interviewers and follow up on references. In this case, apparently both are lacking.
brian d foy
brian: Well, you shouldn't hire someone with limited information, but on the other hand, you probably don't want to spend lots of money to thoroughly investigate a candidate who already looks bad at the initial screening. But of course it depends on the number of candidates for the opening. The more you have, the more you dismiss.
ammoQ
A: 

An obvious question might be, if you don't use Perl at your company in the first place, does it matter how pretty his Perl code is?

I'm not sure the elegance of his Perl code says much about his skills in whatever language you're actually using.

jalf
That's a good point, except for the fact that we are using Perl. This position is a backfill for someone who had to, shall we say, leave in a hurry. I may have confused you. I haven't done Perl for a while, but my colleagues group does use it.
paxdiablo
Ah, fair enough then. :) Since you and the interviewer apparently don't use perl, I assumed it just wasn't relevant for the job.
jalf
+21  A: 

I have to (sort of) disagree with most views expressed here.

Since the code in question could be expressed much more compact and maintainable in idiomatic Perl, you really need to pose the question how much time the candidate spend developing this solution and how much time would have been spent by someone halfway proficient using idiomatic Perl.

I think you'll find that this coding style may be a huge waste of time (and thus the company's money).

I don't argue that every Perl programmer needs to grok the language – that, unfortunately, would be far-fetched – but they should know enough to not spend ages re-implementing core language features in their code over and over again.

EDIT Looking at the code again, I've got to be more drastic: although the code looks very clean, it's actually horrible. Sorry. This isn't Perl. Do you know the saying “you can program Fortran in any language”? Yes, you can. But you shouldn't.

Konrad Rudolph
+1 for obvious reasons
ammoQ
You can't do anything great in a language like this until you are ready to let go of the familiar and embrace the constructs fully
Tom Leys
You can pretty much stop reading the original code after the first line (the 'open', not the shebang). Why open a file when it will do just fine to use <>? Why use a bare file-handle? Why die without $! in the error message? This is not the code of an experienced perl programmer.
William Pursell
+22  A: 

No use strict/use warnings, systematic use of substr instead of regexp, no use of modules. This is definitely not someone who has "very good Perl experience". At least not for real-life Perl projects. Like you, I you suspect, that it's probably a C programmer with a basic knowledge of Perl.

That doesn't mean that they can't learn, especially as there are other Perl people around. It does seem to mean that they overstated their qualification for the job though. A few more questions about how exactly they acquired that very good Perl experience would be in order.

mirod
+21  A: 

I don't care whether he used regular expressions or not. I also don't care whether his Perl looks like C or not. The question that really matters is: is this good Perl? And I'd say it's not:

  1. He didn't use use strict
  2. He didn't enable warnings.
  3. He's using the old-fashioned two-argument version of open.
  4. The "open file" comment hurts and gives me the impression that code he usually writes doesn't contain any comments.
  5. The code is hard to maintain
  6. Was he allowed to use CPAN modules? A good Perl programmer would look at that option first.
innaM
I think the answer to #6 is (or was perceived to be) "no." The spec is so simple and pointless that I think this is a more advanced version of the FizzBuzz question. Personally, I would have submitted two versions: One that showed that I had the necessary knowledge to solve the problem myself (hand-rolled CSV parsing) and one that showed how I would really do it in a production environment (leveraging CPAN).
Michael Carman
you conclude too much from the "open file" comment. I often outline what I want to write by putting those sorts of comments in place first, then filling in the code. I get the steps down, then I code.
brian d foy
I'd put 7. Didn't use Moose
Kent Fredric
and 8. He didn't use any modules. Period. Seriously? Perl user?
Kent Fredric
+68  A: 

His code is a little verbose. Perl is all about modules, and avoiding them makes your life hard. Here is an equivalent to what you posted that I wrote in about two minutes:

 #!/usr/bin/env perl

 use strict;
 use warnings;

 use Text::CSV;

 my $parser = Text::CSV->new({
     allow_whitespace   => 1,
     escape_char        => '\\',
     allow_loose_quotes => 1,
 });

 while(my $line = <>){
     $parser->parse($line) or die "Parse error: ". $parser->error_diag;
     my @row = $parser->fields;
     print $line;
     print "\t[$_]\n" for @row;
 }
jrockway
The only correct answer for parsing CSV in Perl is to use a module. CSV is nasty, and it's easy to make little mistakes. Let someone else deal with it (they already have).
Dave Rolsky
This is imho the only correct answer. Wastes less time, and will actually work. And strangers reading your code wont die from sheer confusion when they have to debug your CSV parser later ( because they will ).
Kent Fredric
Anybody who claimed to know Perl well, should know about Text::CSV and regex's, that would be my only concern. Did he ask if he could use a library?
Chris Huang-Leaver
Perhaps the exercise was esp. to implement it from scratch?
Albert
To be fair to the poor sucker who got this sprung on him at an interview. Was the test set in such a way that it was clear using modules was OK? Was the environment set up well enough that he could find and use modules like Text::cvs? (This is a common perl problem -- you get told perl is installed and then you find out that only the interpeter is installed, all the "standard" modules are missing and cpan can't get past the corporate firewall).
James Anderson
+5  A: 

Just the initial block indicates that he has missed the fundamentals about Perl.

    while ($line ne "") {
    # Skip spaces and start with empty field.

    if (substr ($line,0,1) eq " ") {
        $line = substr ($line,1);
        next;
    }

That should at least be written using a regular expression to remove leading white space. I like the answer from jrockway best, modules rock. Though I would have used regular expressions to do it, something like.

#!/usr/bin/perl -w
#
# $Id$
#

use strict;

open(FD, "< qq.in") || die "Failed to open file.";
while (my $line = <FD>) {
    # Don't like chomp.
    $line =~ s/(\r|\n)//g;
    # ".*?[^\\\\]"  = Match everything between quotations that doesn't end with
    # an escaped quotation, match lazy so we will match the shortest possible.
    # [^",]*?       = Match strings that doesn't have any quotations.
    # If we combine the two above we can match strings that contains quotations
    # anywhere in the string (or doesn't contain quotations at all).
    # Put them together and match lazy again so we can match white-spaces
    # and don't include them in the result.
    my $match_field = '\s*((".*?[^\\\\]"|[^",]*?)*)\s*';
    if (not $line =~ /^$match_field,$match_field,$match_field,$match_field$/) {
        die "Invalid line: $line";
    }
    # Put values in nice variables so we don't have to deal with cryptic $N
    # (and can use $1 in replace).
    my ($user_id, $name, $level, $numeric_id) = ($1, $3, $5, $7);
    print "$line\n";
    for my $field ($user_id, $name, $level, $numeric_id) {
        # If the field starts with a quotation,
        # strip everything after the first unescaped quotation.
        $field =~ s/^"(.*?[^\\\\])".*/$1/g;
        # Now fix all escaped variables (not only quotations).
        $field =~ s/\\(.)/$1/g;
        print "   [$field]\n";
    }
}
close FD;
Johan Soderberg
From a lot of the posts it seems many are against using regexp, but looking at the sample data and the result I think regexp are good. Since the sample isn't a valid csv-file Text::CSV will break on 'gt," Turner, George " rubbish,user,1'. It will display: [gt] [" Turner] [George " rubbish] [user] [1]It should however strip the ' rubbish' text from the string. Go with regexp, live a little!
Johan Soderberg
+5  A: 

Forgive this guy. I would not have dared to parse CSV with a regex even though it can be done.

The DFA in structured code is more obvious than the regex here and DFA -> regex translation is nontrivial and prone to stupid mistakes.

Joshua
+2  A: 

As a non Perl (?programmer?), I have to say, that is probably the most legible Perl I have ever read! :)

Hiring someone over something like a scripting language that can be learned in days to weeks (if it's a worthwhile scripting language!) seems highly flawed in the first place.

Personally I would probably hire this person for different reasons. The code is well structured and reasonably well commented. Language specifics can easily be taught later.

Serapth
http://p3rl.org/Moose , makes Perl *really* readable IMO.
Kent Fredric
http://xoa.petdance.com/Stop_saying_"script"http://www.perl.com/lpt/a/57#Perl_is_just_for_one_liners_ca
Ape-inago
-1 for being wrong about scripting languages and language idioms.
Sean McMillan
+8  A: 

One of my colleagues recently interviewed some candidates for a job and one said they had very good Perl experience.

If this person thinks he has very good Perl experience and he writes Perl like this, he is probably a victim of the Dunning-Kruger effect.

So, that's a no-hire.

Unknown
A: 

Hmm, I don't see anything in the request that quotes should be removed, and words should be removed. The input file had the word " rubbish", and it's not in the output.

I've seen CSV files, exported with quotes, would expect those same quotes back. If your specification had been to remove quotes and extraneous words past quotes, maybe this work would be required.

I'd watch that, and the verbosity. Look for somebody lazier (compliment in Perl).

open (IN, "csv.csv");
while (<IN>) {
    #print $_;
    chomp;
    @array = split(/,/,$_);
    print "[User Id] =  $array[0]  [Name] = $array[1]  [Level] =  $array[2] [Numeric ID] = $array[3]\n";    
}
chopppen5
The word 'rubbish' in the data appeared in a stretch of data like: `,"George Turner" rubbish,`. This is ill-formed CSV data. CSV is supposed to have either plain text between commas or quoted text between commas; this has some of both. One legitimate way to treat malformed data is to ignore the invalid part (hence 'George Turner' as chosen by the answer); another is to consume it and treat the result as legitimate (hence 'George Turner rubbish', as you seem to prefer). The tag 'rubbish' suggests that the former was appropriate; the issue can be discussed with the interviewer.
Jonathan Leffler
A: 

If the code works, it works. I wish s/he would have programmed Fortran in Perl, like a real man/women.

Arrieta
+1  A: 

The crucial point here is - naturally after assuring that it works as expected - whether the code is maintainable.

  • Did you understand it?
  • Would you feel comfortable fixing a bug in it?

Perl programs have a tendency for looking like what a cat types by accident when walking on the keyboard. If this person knows how to write readable Perl code that fits to the team, this is actually a good thing.

Then again, you may want to teach him about regular expressions, but only carefully :-)

Thorbjørn Ravn Andersen
+2  A: 

Not only does the code suggest that the candidate doesn't really know Perl, but all those lines that say $line = substr ($line,1) are dreadful in any language. Try parsing a long line (say a few thousand fields) using that type of approach and you will see why. It just goes to show the sort of problem that Joel Spolsky discussed in this post.

itub
A: 

Code looks clean and readable. For that size, it does not require that much comments (perhaps none at all.) It's not just about good comments, but also good code, and the later is more important than the former.

If we were looking at a more complex/larger piece of code, I would say that comments are needed. But for that (specially the way it was written - well written), I don't think so.

I think it is unfair and vain to put doubt on the applicant given the piece of code submitted by him/her is quite acceptable and did the job.

luis.espinal