tags:

views:

800

answers:

5

I was asked this question in an interview for an internship, and probably the first solution I suggested was to try and use a regular expression (I usually am a little stumped in interviews). Something like this

(?P<str>[a-zA-Z]+)(?P<n>[0-9]+)

I thought it would match the strings and store them in the variable "str" and the numbers in the variable "n". How, I was not sure of. :-(

So it matches strings of type "a1b2c3", but a problem here is that it also matches strings of type "a1b". Could anyone suggest a solution to deal with this problem?

Also, is there any other regular expression that could solve this problem?

+18  A: 

how about:

while ($line =~ s/^([a-z]+)(\d+)//i)
{
    print $1 x $2;
}
tster
using regex is too easy in your language, in java i would have to write a lot more.
01
Go go gadget Perl!
tster
@01: Then maybe you are using the wrong language? Or maybe you should not use regexes in your language? Or maybe you need a better regex library in your language.
Jonathan Leffler
Don't most languages now have Perl-style regular expression libraries?
Barry Brown
Could technically be done in one single regex: `$line =~ s/([a-z]+)(\d+)/print $1 x $2;""/ieg;` Also, can be done without destroying the original string, if you change the loop condition to `$line =~ /([a-z]+)(\d+)/ig`
Chris Lutz
+6  A: 

Answering your question directly:

  • No, regular expressions match text and don't print anything, so there is no way to do it solely using regular expressions.

The regular expression you gave will match one string/number pair; you can then print that repeatedly using an appropriate mechanism. The Perl solution from @tster is about as compact as it gets. (It doesn't use the names that you applied in your regex; I'm pretty sure that doesn't matter.)

The remaining details depend on your implementation language.

Jonathan Leffler
+28  A: 
Pavel Shved
Beautiful images; interesting commentary. I'm just not sure about 'the problem in the subject is essentially irregular'. It is very regular - but not doable just with regular expressions.
Jonathan Leffler
the regular expression in my answer is using only a regular grammar, no extensions. It's just that it also loops and prints in Perl.
tster
@Jonathan Leffler: thanks for pointing it out, I'll use another word. @tster: I'll edit my answer with a scrutiny of your solution and why it's not "using regular grammar".
Pavel Shved
+1 extremely informative
tster
+1 for an incredibly detailed answer. Wow.
TrueWill
It should be remembered that all real computers are finite state automata, as they have a finite amount of storage. So though in a theoretical sense a FSA is "incapable" of certain things, in practical terms it is quite capable enough for our purposes. The analogy here is that you could write a regex that was *long enough*, containing enough repetitions for your practical needs. But it would be less convenient than using a mix of regex and procedural looping in the host language. So it's not a matter of things being possible/impossible, but just convenient/inconvenient.
Daniel Earwicker
@Earwicker: we apply computer science not only to reason about theoretical possibility. We apply it to rule out the approach ("use regular expressions **only**") by reasoning that it contradicts the nature of concepts we consider. So I'd like to make matters you're talking about as a comment, to appearing of which I actually was looking forward. :)
Pavel Shved
+1. Thanks a lot for such an informative post.
Siddhant
@Earwicker, a computer is not a Finite State Automata as it certainly has a place to store information outside of its internal states. A computer is more accurately a Linear Bounded Automata (LBA), that is, a Turing Machine with a non-infinite tape.
Falaina
@Falaina: Turing machine with finite tape *is* a finite state automaton. In my PC there is only 200 Gb of "tape" (i.e. harddrive), and this makes it a FSA. That's what Earwicker was talking about.
Pavel Shved
/applaud - extremely well done sir
annakata
+4  A: 

Nope, this is your basic 'trick question' - no matter how you answer it that answer is wrong unless you have exactly the answer the interviewer was trained to parrot. See the workup of the issue given by Pavel Shved - note that all invocations have 'not' as a common condition, the tool just keeps sliding: Even when it changes state there is no counter in that state

I have a rather advanced book by Kenneth C Louden who is a college prof on the matter, in which it is stated that the issue at hand is codified as "Regex's can't count." The obvious answer to the question seems to me at the moment to be using the lookahead feature of Regex's ...

Probably depends on what build of what brand of regex the interviewer is using, which probably depends of flight-dynamics of Golf Balls.

Nicholas Jordan
I disagree with your 'trick question' assessment. As I read the question, at least, the interviewer only specified the input and desired output. Using a regex to get from point A to point B was the OP's idea, not a condition imposed by the interviewer.
Dave Sherohman
@Dave: Noted, and correct. I missed the how to get there was OP's idea: OP states "I was asked this question in an interview for an internship" from which I inferred that OP I was asked this question in an interview for an internship - the fact that I flamed the interviewer comes from industrial environments that I work in where 304 Stainless wraps such interviewers with protective gear that is toothless and cannot be used in the manner directed.If you're a "Pro From Dover" it is worth the effort, too often it is the Master Crypto Thesis candidate vs someone who actually runs a shop.
Nicholas Jordan
+1  A: 

Nice answers so far. Regular expressions alone are generally thought of as a way to match patterns, not generate output in the manner you mentioned.

Having said that, there is a way to use regex as part of the solution. @Jonathan Leffler made a good point in his comment to tster's reply: "... maybe you need a better regex library in your language."

Depending on your language of choice and the library available, it is possible to pull this off. Using C# and .NET, for example, this could be achieved via the Regex.Replace method. However, the solution is not 100% regex since it still relies on other classes and methods (StringBuilder, String.Join, and Enumerable.Repeat) as shown below:

string input = "aa67bc54c9";
string pattern = @"([a-z]+)(\d+)";
string result = Regex.Replace(input, pattern, m =>
        // can be achieved using StringBuilder or String.Join/Enumerable.Repeat
        // don't use both
        //new StringBuilder().Insert(0, m.Groups[1].Value, Int32.Parse(m.Groups[2].Value)).ToString()
        String.Join("", Enumerable.Repeat(m.Groups[1].Value, Int32.Parse(m.Groups[2].Value)).ToArray())
         + Environment.NewLine // comment out to prevent line breaks
        );
Console.WriteLine(result);

A clearer solution would be to identify the matches, loop over them and insert them using the StringBuilder rather than rely on Regex.Replace. Other languages may have compact idioms to handle the string multiplication that doesn't rely on other library classes.

To answer the interview question, I would reply with, "it's possible, however the solution would not be a stand-alone 100% regex approach and would rely on other language features and/or libraries to handle the generation aspect of the question since the regex alone is helpful in matching patterns, not generating them."

And based on the other responses here you could beef up that answer further if needed.

Ahmad Mageed
Excellent, if the interviewer can understand it.
Nicholas Jordan