views:

272

answers:

7

This is a question in my paper test today, the function signature is

int is_match(char* pattern,char* string)

The pattern is limited to only ASCII chars and the quantification * and ?, so it is relatively simple. is_match should return 1 if matched, otherwise 0.

so, how to do, i am really dizzy on this?

thanks in advance!

A: 

try to make a list of interesting test cases:

is_match("dummy","dummy") should return true;

is_match("dumm?y","dummy") should return true;

is_match("dum?y","dummy") should return false;

is_match("dum*y","dummy") should return true;

and so on ...

then see how to make the easier test pass, then the next one ...

siukurnin
You trying to teach them TDD at the same time? ;) Why not post some simple instructions on how to use Google Test, while your at it: http://code.google.com/p/googletest/
Merlyn Morgan-Graham
Plz correct me if Im wrong but...shouldn't `s_match("dum?y","dummy")` return **true**. `m?` matches the letter **m** zero or more times right?
giddy
@giddy: Zero or one times, see e.g. [here](http://www.regular-expressions.info/reference.html). `*` would mean *"zero or more times"*.
Georg Fritzsche
Ack - "you're". I guess it's bed time...
Merlyn Morgan-Graham
+1  A: 

Cheat. Use #include <boost/regex/regex.hpp>.

Benoit
would be a perfect answer for a real world project : 1) never trust your boss when he says "we will never have a use for all other regex stuff, just ? and *, promised" 2) don't reinvent warm water when you have hot water on the tap. My answer was based on the asumption this is homework ...
siukurnin
Also, the questio is tagged C as well as C++.
JeremyP
+1  A: 

Didn't test this, actually code it, or debug it, but this might get you a start...

for each character in the pattern
  if pattern character after the current one is *
    // enter * state
    while current character from target == current pattern char, and not at end
      get next character from target
    skip a char from the pattern
  else if pattern character after the current one is ?
    // enter ? state
    if current character from target == current pattern char
      get next char from target
    skip a char from the pattern
  else
    // enter character state
    if current character from target == current pattern character
      get next character from target
    else
      return false
return true
Merlyn Morgan-Graham
One important thing missing here is the ability to backtrack, otherwise the regex `ab?bc` couldn't match `abc`.
Tim Pietzcker
-1 Add a stack here.
Basilevs
@Tim: I attempted to solve that by matching the `?` forward - "if the pattern character after the current one is ?"
Merlyn Morgan-Graham
But that doesn't work. The `b?` matches the `b`, then the next `b` can't match the `c` and the regex fails. There has to be a way to remember that the `b?` was optional, and to exercise that option. That's probably what Basilevs means by "add a stack here" - you need to remember all positions where there are several ways of applying the current regex token and try all possible combinations before declaring failure.
Tim Pietzcker
@Tim: I think there's some ambiguity to what I meant by "get next" vs "skip a char" vs "current". Maybe I'll just rewrite the darn thing in C =P I am fairly certain you don't need a stack, as long as you're willing to live with hacks that won't apply to more complicated features (that we currently aren't implementing) in the pattern.
Merlyn Morgan-Graham
@Tim: Oh, I get what you're saying now... Maybe I should get off my butt and go read a book on compilers, like I've been wanting to :)
Merlyn Morgan-Graham
+2  A: 

See This Question for a solution you can not submit. See this paper for a description of how to implement a more readable one.

AShelly
wow... everyone of those answers' code makes my eyes bleed.. then Im unable to cry because I feel so insignificant and small!
giddy
A: 

The full power of regular expressions and finite state machines are not needed to solve this problem. As an alternative there is a relatively simple dynamic programming solution.

Let match(i, j) be 1 if it is possible to match the the sub-string string[i..n-1] with the sub-pattern pattern[j, m - 1], where n and m are the lengths of string and pattern respectively. Otherwise let match(i, j) be 0.

The base cases are:

  • match(n, m) = 1, you can match an empty string with an empty pattern;

  • match(i, m) = 0, you can't match a non-empty string with an empty pattern;

The transition is divided into 3 cases depending on whether the current sub-pattern starts with a character followed by a '*', or a character followed by a '?' or just starts with a character with no special symbol after it.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int is_match(char* pattern, char* string)
{
  int n = strlen(string);
  int m = strlen(pattern);

  int i, j;
  int **match;

  match = (int **) malloc((n + 1) * sizeof(int *));
  for(i = 0; i <= n; i++) {
    match[i] = (int *) malloc((m + 1) * sizeof(int));
  }

  for(i = n; i >= 0; i--) {
    for(j = m; j >= 0; j--) {
      if(i == n && j == m) {
        match[i][j] = 1;
      }
      else if(i < n && j == m) {
        match[i][j] = 0;
      }
      else {
        match[i][j] = 0;
        if(pattern[j + 1] == '*') {
          if(match[i][j + 2]) match[i][j] = 1;
          if(i < n && pattern[j] == string[i] && match[i + 1][j]) match[i][j] = 1;
        }
        else if(pattern[j + 1] == '?') {
          if(match[i][j + 2]) match[i][j] = 1;
          if(i < n && pattern[j] == string[i] && match[i + 1][j + 2]) match[i][j] = 1;
        }
        else if(i < n && pattern[j] == string[i] && match[i + 1][j + 1]) {
          match[i][j] = 1;
        }
      }
    }
  }

  int result = match[0][0];

  for(i = 0; i <= n; i++) {
    free(match[i]);
  }

  free(match);

  return result;
}

int main(void)
{
  printf("is_match(dummy, dummy)  = %d\n", is_match("dummy","dummy"));
  printf("is_match(dumm?y, dummy) = %d\n", is_match("dumm?y","dummy"));
  printf("is_match(dum?y, dummy)  = %d\n", is_match("dum?y","dummy"));
  printf("is_match(dum*y, dummy)  = %d\n", is_match("dum*y","dummy")); 

  system("pause");

  return 0;
}

The time complexity of this approach is O(n * m). The memory complexity is also O(n * m) but with a simple modification can be reduced to O(m).

Ivanvi
But this is not C++! :)
Basilevs
+2  A: 

Here is recursive extendable implementation. Tested for first order of pattern complexity.

#include <string.h>
#include <string>
#include <vector>
#include <iostream>

struct Match {
  Match():_next(0) {}
  virtual bool match(const char * pattern, const char * input) const {
    return !std::strcmp(pattern, input);
  }
  bool next(const char * pattern, const char * input) const {
    if (!_next) return false;
    return _next->match(pattern, input);
  }
  const Match * _next;
};

class MatchSet: public Match {
  typedef std::vector<Match *> Set;
  Set toTry;
public:
  virtual bool match(const char * pattern, const char * input) const {
    for (Set::const_iterator i = toTry.begin(); i !=toTry.end(); ++i) {
      if ((*i)->match(pattern, input)) return true;
    }
    return false;
  }
  void add(Match * m) {
    toTry.push_back(m);
    m->_next = this;
  }
  ~MatchSet() {
    for (Set::const_iterator i = toTry.begin(); i !=toTry.end(); ++i)
      if ((*i)->_next==this) (*i)->_next = 0;
  }
};

struct MatchQuestion: public Match  {
  virtual bool match(const char * pattern, const char * input) const {
    if (pattern[0] != '?')
      return false;
    if (next(pattern+1, input))
      return true;
    if (next(pattern+1, input+1))
      return true;
    return false;
  }
};

struct MatchEmpty: public Match {
  virtual bool match(const char * pattern, const char * input) const {
    if (pattern[0]==0 && input[0]==0)
      return true;
    return false;
  }
};

struct MatchAsterisk: public Match {
  virtual bool match(const char * pattern, const char * input) const {
    if (pattern[0] != '*')
      return false;
    if (pattern[1] == 0) {
      return true;
    }
    for (int i = 0; input[i] != 0; ++i) {
      if (next(pattern+1, input+i))
        return true;
    }
    return false;
  }
};

struct MatchSymbol: public Match {
  virtual bool match(const char * pattern, const char * input) const {
    // TODO: consider cycle here to prevent unnecessary recursion
    // Cycle should detect special characters and call next on them
    // Current implementation abstracts from that
    if (pattern[0] != input[0])
      return false;
    return next(pattern+1, input+1);
  }
};

class DefaultMatch: public MatchSet {
  MatchEmpty empty;
  MatchQuestion question;
  MatchAsterisk asterisk;
  MatchSymbol symbol;
public:
  DefaultMatch() {
    add(&empty);
    add(&question);
    add(&asterisk);
    add(&symbol);
  }
  void test(const char * p, const char * input) const {
    testOneWay(p, input);
    if (!std::strcmp(p, input)) return;
    testOneWay(input, p);
  }
  bool testOneWay(const char * p, const char * input) const {
    const char * eqStr = " == ";
    bool rv = match(p, input);
    if (!rv) eqStr = " != ";
    std::cout << p << eqStr << input << std::endl;
    return rv;
  }

};


int _tmain(int argc, _TCHAR* argv[])
{
  using namespace std;

  typedef vector<string> Strings;
  Strings patterns;

  patterns.push_back("*");
  patterns.push_back("*hw");
  patterns.push_back("h*w");
  patterns.push_back("hw*");

  patterns.push_back("?");
  patterns.push_back("?ab");
  patterns.push_back("a?b");
  patterns.push_back("ab?");

  patterns.push_back("c");
  patterns.push_back("cab");
  patterns.push_back("acb");
  patterns.push_back("abc");

  patterns.push_back("*this homework?");
  patterns.push_back("Is this homework?");
  patterns.push_back("This is homework!");
  patterns.push_back("How is this homework?");

  patterns.push_back("hw");
  patterns.push_back("homework");
  patterns.push_back("howork");

  DefaultMatch d;
  for (unsigned i = 0; i < patterns.size(); ++i)
    for (unsigned j =i; j < patterns.size(); ++j)
      d.test(patterns[i].c_str(), patterns[j].c_str());

    return 0;
}

If something is unclear, ask.

Basilevs
I appreciate any design comments.
Basilevs
The `*` matches the previous symbol zero-or-more times, and `?` matches the previous symbol one-or-more times. In several places here, you have `?` and `*` with no preceding symbol. Is this intentional?
Merlyn Morgan-Graham
* matches any amount of any symbols, ? matches zero or one symbol
Basilevs
Now that I think about it, bash matches ? as exactly one symbol, I should've probably do the same
Basilevs
A: 

Simple recursive implementation. It's slow but easy to understand:

int is_match(char *pattern, char *string)
{
    if (!pattern[0]) {
        return !string[0];
    } else if (pattern[1] == '?') {
        return (pattern[0] == string[0] && is_match(pattern+2, string+1))
            || is_match(pattern+2, string);
    } else if (pattern[1] == '*') {
        size_t i;
        for (i=0; string[i] == pattern[0]; i++)
            if (is_match(pattern+2, string+i)) return 1;
        return 0;
    } else {
        return pattern[0] == string[0] && is_match(pattern+1, string+1);
    }
}

Hope I got it all right.

R..
i think the for loop should be for(i=0;string[i]==pattern[0];i++);(note the trailing ";"),did you make a typo?
Tracy
actually the else if(pattern[1]=='*') branch could be replace by for( i = 0 ; string[i]==pattern[0];i++) ;//note this semi-colon return is_match(pattern+2,string+i);so how do you think?
Tracy
@Tracy: nope. Then "x*xxxxxy" will fail to match "xxxxxy". As I said my answer is slow but simple because handling these cases correctly is not simple unless you're willing to convert the pattern to a state machine.
R..