views:

1028

answers:

14

Which of the following regular expressions cannot be transformed into one using just the base operators of | and * (Kleene star) and grouping ( () ) *

/adt[ui]t+\S?/ 
/b[^e]ac?h+\s\s\sBALL/i 
/he{5}[Ly]+o\bwo[^r]L(?!d)/ 
/ab?c+(de{1,3})/

Would this be a fair question to administer for a Senior Programmer position?

+23  A: 

In my opinion no: since it relies on very specific knowledge, and appears to be an attempt to catch the interviewee out.

Pack up your negative lookahead assertions and go back to the drawing board.

Personally I prefer questions which allow a range of abilities to respond in a correct way, and don't rely on the interviewee remembering one obscure fact.

For instance, "explain the following series of regular expressions".

On the other hand, they don't let me interview much these days.

Alex Brown
Also, it's a puzzle question: One correct answer. I'd rather see how somebody went about solving something than have them take a multiple choice exam.
Eric
I agree. If regex experience is a requirement for the job (in that they must have it from day 1 and can't read up on it for day 2), I guess this would be good. However, I don't think a question as detailed or arbitrary as this makes sense--you don't learn much about the candidate regardless of their answer. If they get it right, kudos...but what have they really proven? If they get it wrong, well maybe the question was too narrow/etc.
Michael Haren
In this case, (?!d) == ($|a|b|c|e|f|...)
Greg Bacon
@gbacon: In isolation, you're right but in the presence of other elements, (like the "g" flag), it's very difficult to duplicate negative lookahead behaviour
Adrian Pronk
It isn't really a puzzle question as much as it could be a thinking question. If you know regex syntax, and you know, or are told what the base RE operators are, it *could* be a good way to watch a person think through a problem that is new to them. Basically you have to reason that various portions of the regex can or cannot be converted to the more limited base operators. I am not saying it is a good question, but it could be if done right.
Robert Lamb
@gbacon: Maybe the question was to fool the interviewee :P
rFactor
It's been a while since I looked at RegExps at this level, but can the regexp with the lookahead actually be identified by an FSA?
Uri
+1  A: 

That looks pretty reasonable, assuming that you give them a couple minutes (and a pen and paper, perhaps.) I'm no regex guru -- and not a "senior programmer" in any sense -- but I can figure that out.

Some people might get caught up on some of the less common regex syntax -- for example, I wouldn't know what the "?!" operator was off the top of my head except I had to use it a week ago. It's pretty easy to go look at a reference sheet every time you have to make a non-trivial regex, so I can understand if someone doesn't know the syntax by heart.

That said, you could probably find a simpler problem that more directly tests whether the candidate is somewhat familiar with regular expressions (which is really the thing you care about, right?)

mquander
"but I can figure that out"... humour me then, what's the answer?
jpinto3912
He gave the answer, ?!.I would be far less impressed that someone knew that negative lookahead assertions were the exception than if I saw them quickly build some completely straightforward answers to trivial regexp problems. Use of regexps and indeed unix pipe commands in general is so generally useful I would hope that all engineers knew it, and if not I would try to inculcate it after they joined.
Alex Brown
On the other hand, I think a Senior programmer is one with 1-2 years experience. I would not require this knowledge of such a person.
Alex Brown
Whoa. I thought a senior programmer is one with 10+ years of experience. I guess that explains why I expect them to know regular expressions : P
mquander
+2  A: 

Well, I think it wouldn't really be a fair question for you to ask, since you admit you don't know the answer yourself; it might be, if the candidate had regular expressions on their resume, and if the person interviewing understood the question well enough.

However: I personally choose not to use "puzzle" questions that depend strongly on specific knowledge; I try to ask more general questions, that let me determine HOW a candidate approaches solving a problem. People can learn new technologies, but bringing the right attitude to solving a problem is much harder to learn / teach.

McWafflestix
+3  A: 

Not all programming involves the use of complex regular expressions. For example, I doubt you would find any in the Linux kernel. So your question only makes sense if you state what kind of Senior Programmer you would be asking the interview question.

anon
The programming might not, but regular expressions are a godsend when searching and editing in a large codebase; I would think it was pretty weird if you were an experienced programmer and didn't have a reasonable command of regex.
mquander
A reasonable command yes, expert command of the subject, no.
anon
+2  A: 

I'm the person that many come to for help with regular expressions, and in many years of doing this, I've so far never had to use look-ahead. So unless a reference is provided, this is a trick question that won't tell you much about the programmer's skills. Also, this is highly specific to whichever language regex uses this specific syntax for lookaheads.

Eddie
+1  A: 

I'll say it's a valid question iff you are trying to see what kind of questions the developer asks to attempt to resolve the ambiguities in your question.

Bonus points for also knowing regular expressions well enough to solve.

Triptych
+2  A: 

I would ask myself what exactly I am trying to find out by asking this question. If person cannot answer, does it mean that he is a bad programmer? If he answers, does it mean he is a good programmer? I still would ask this question during the interview, but I would be more interested in how person thinks.

Georgy Bolyuba
+4  A: 

It'd be fair if you asked for a Regex guru and somebody turned up claiming to be one... but the "average" Senior Programmer - I'd be more interested in what projects they'd worked on, how they tackled the problems at hand, what in their opinion caused the project to fail/succeed, what they learnt from the success/failure and so on.

Leonard H Martin
+4  A: 

The phrase "Kleene star" is a tipoff that someone is talking about regular languages in the theoretical CS sense, i.e., those that can be recognized by deterministic finite automata. Did the same developer also come up with a question involving the pumping lemma?

Help us answer your question by providing more context. Based on your comment, I assume thorough understanding of regular expressions is not critical in your organization, so is this a straightforward test of knowledge? Are you trying to gauge the candidate's ability to gather information and reason abstractly? Willingness to step outside the box and answer None Of The Above? Did you solicit these interview questions? If so, did you provide guidelines, or was it an "I need 3 potential interview questions from everyone by COB today, and and please submit along with a TPS-report coversheet!" assignment?

Greg Bacon
+2  A: 

I guess your company must code only in regular expressions for that to be a good senior interview question.

JaredPar
+3  A: 

@jpinto3912 - OK:

/adt[ui]t+\S?/                == adt(u|i)tt*(\S|)

// this one is a pain due to the insensitive casing, but is do-able
/b[^e]ac?h+\s\s\sBALL/i == 
  (B|b)((a|A)|(b|B)|(c|C)|(d|D)|(f|F)|...|(z|Z)|..special characters..)(a|A)(c|C|)
  (h|H)(h|H)*\s\s\s(b|B)(a|A)(l|L)(l|L)

// fail on negative lookahead
/he{5}[Ly]+o\bwo[^r]L(?!d)/   == heeeee(L|y)(L|y)*o\bwo(all characters not r)L 


/ab?c+(de{1,3})/              == a(b|)cc*d(e|ee|eee)

@candybar: I'm no expert, so if someone said they had a mastery of regex, I think this is a great test to see if they know their stuff. Ultimately, you're going to see what your senior developer is made of. How they go about solving problems and how they think. As a life preserver, give them a regex cheat sheet to go with the question, and a reasonable amount of time to work through it (took me ~15 minutes.)


Edit :: Made the corrections as suggested; props to you guys for noticing them! If anyone else sees an error feel free to correct the answer.

Gavin Miller
`he{5}` is not `hehehehehe` but `heeeee`. And `adt[ui]t+\S?` can be written as `adt(u|i)tt*(\S|)`.
Gumbo
Fail on negative lookahead, you don't get the job.
Alex Brown
Oh, and `[Ly]+` is not `(LL*|yy*)` but `(L|y)(L|y)*`
Gumbo
I'm just bitter, since I didn't get the answer tag, but 'man ascii' did.
Alex Brown
Also, de{1,3} is not (de|dede|dedede) but is (de|dee|deee)
Eddie
A: 

It's not really a question of "is the question fair" but how much weight do you give to the question and consequently the answer. What is the purpose of the question?

1) Is it to see if they know regular expressions to this level of detail (theoretically because the job demands it)?

2) Is it to see if they are able to figure out tough problems when presented them? (Always good to know about potential candidates.)

3) Or is it simply designed to see how much effort they are willing to put into something before they give up (or simply guess)? If they're not willing to work in an interview then chances are they're not going to want to do much work outside the interview either.

Or is it for one of a half dozen other reasons that you might ask a similar type of question. You're trying to find out as much as you can about your candidates. Your questions simply help you do that. What you ask and how you weight the answer should fit your needs and not the candidates. This way you'll be able to get what your looking for since your paying for'em.

+1  A: 

I know the question is answered, but for those that are interested...

The base operators of a regular expression are union, concatenation, and Kleene star (zero or more copies of). So the question basically asks if the regex syntax gets beyond this.

[ab] is just (a+b), ie, union.

a? is just (a+epsilon) where epsilon is the empty string.

a+ is just aa*.

a{5} is just aaaaa.

e{1,3} is just (e+ee+eee).

[^e] is just (a+b+c+d+f+...+x+y+z) which is messy, and might also need to include capital letters depending on your alphabet.

/i (ignore case) is messy but do-able, it pretty much doubles the size of things.

shortcuts like \s are just the union of each of the possibilities

?!d I would have to look up.

Mostly you cannot do stuff like reverses, or look ahead or look back or look for a variable number of things (a constant number or constant range is ok though).

Robert Lamb
Let’s take Unicode as the alphabet ;)
Gumbo
OUCH! I guess it could be worse...
Robert Lamb
A: 

The question is pointless for interviewing a programmer you're hiring to write real-world code, because nobody in the real world still uses bare bones regex flavors with just the features that you mentioned.

If you want to put regex puzzles into your interview, provide a regex and ask the candidate to write an exhaustive list of possible matches as the first question, and provide an exhaustive list of matches asking for the regex as the second question. That's what real-world programmers do.

Jan Goyvaerts