views:

237

answers:

3

Find minimum window width in string x that contains all characters in string y.

E.g.

String x = "coobdafceeaxab"
String y = "abc"

Answer should be 5 because the shortest substring in x that contains all three letters in y is "bdafc".

I can think of a naive solution with complexity n^2 * log(m), say n = len(x) and m = len(y). Can anyone think of a better solution? Thanks.

Update: now think of it, if I change my set to tr1::unordered_map, then I can cut the complexity down to n^2, because insertion and deletion should both be O(1).

+1  A: 

This is my solution in C++, just for reference.

Update: originally I used std::set, now I change it to tr1::unordered_map to cut complexity down to n^2, otherwise these two implementations look pretty similar, to prevent this post from getting too long, I only list the improved solution.

#include <iostream>
#include <tr1/unordered_map>
#include <string>

using namespace std;
using namespace std::tr1;

typedef tr1::unordered_map<char, int> hash_t;

// Returns min substring width in which sentence contains all chars in word
// Returns sentence's length + 1 if not found
size_t get_min_width(const string &sent, const string &word) {
    size_t min_size = sent.size() + 1;
    hash_t char_set; // char set that word contains
    for (size_t i = 0; i < word.size(); i++) {
        char_set.insert(hash_t::value_type(word[i], 1));
    }
    for (size_t i = 0; i < sent.size() - word.size(); i++) {
        hash_t s = char_set;
        for (size_t j = i; j < min(j + min_size, sent.size()); j++) {
            s.erase(sent[j]);
            if (s.empty()) {
                size_t size = j - i + 1;
                if (size < min_size) min_size = size;
                break;
            }
        }
    }
    return min_size;
}

int main() {
    const string x = "coobdafceeaxab";
    const string y = "abc";
    cout << get_min_width(x, y) << "\n";
}
grokus
I think you should put your reference solution in the question, not as an answer.
Albin Sunnanbo
Well, it is maybe *the* answer
ring0
+4  A: 

This is how I would do it:
Create a hash table for all the characters from string Y. (I assume all characters are different in Y).

First pass:
Start from first character of string X.
update hash table, for exa: for key 'a' enter location (say 1).
Keep on doing it until you get all characters from Y (until all key in hash table has value).
If you get some character again, update its newer value and erase older one.

Once you have first pass, take smallest value from hash table and biggest value.
Thats the minimum window observed so far.

Now, go to next character in string X, update hash table and see if you get smaller window.

Algorithm complexity: O(n) : One pass
space: O(k)


Edit:

Lets take an example here:
String x = "coobdafceeaxab"
String y = "abc"

First initialize a hash table from characters of Y.
h[a] = -1
h[b] = -1
h[c] = -1

Now, Start from first character of X.
First character is c, h[c] = 0
Second character (o) is not part of hash, skip it.
..
Fourth character (b), h[b] = 3
..
Sixth character(a), enter hash table h[a] = 5.
Now, all keys from hash table has some value.
Smallest value is 0 (of c) and highest value is 5 (of a), minimum window so far is 6 (0 to 5).
First pass is done.

Take next character. f is not part of hash table, skip it.
Next character (c), update hash table h[c] = 7.
Find new window, smallest value is 3 (of b) and highest value is 7 (of c).
New window is 3 to 7 => 5.

Keep on doing it till last character of string X.

I hope its clear now.


Edit

There are some concerns about finding max and min value from hash.
We can maintain sorted Link-list and map it with hash table.
Whenever any element from Link list changes, it should be re-mapped to hash table.
Both these operation are O(1)

Total space would be m+m

Jack
I don't actually understand what you are describing. Can you take my example above and illustrate how your arrive at the correct answer?
grokus
@Jack, +1 for you, however I must say each time you look up min and max values in your hash, it takes O(m), so your overall complexity is O(nm), not O(n) as you advertised.
grokus
Well, there is a way you can maintain ordered list where you can keep track of first and last occurrence and map it to hash. whenever you change key/value, update its value in list using hash. This is O(1) due to hash.
Jack
Maintain Link list and map with hash. Total complexity will be O(n), space will be m + m.
Jack
Check out my answer for implementations in Java and C++ of this method.
Sheldon L. Cooper
+1 for solution and having ordered list of indexes - I was thinking pretty much the same approach.
DK
@Jack, I will accept this answer if I can convince myself all operations of insert/delete/min/max are O(1). I'm not a data structure or algorithms guru, so could you explain how you maintain your hash and sorted linked list at the same time O(1) operations? Perhaps write some pseudo code to illustrate? Or if you can can point me to some online resource it'd be good too.
grokus
+1  A: 

Here's my solution in C++:

int min_width(const string& x, const set<char>& y) {
  vector<int> at;
  for (int i = 0; i < x.length(); i++)
    if (y.count(x[i]) > 0)
      at.push_back(i);

  int ret = x.size();
  int start = 0;
  map<char, int> count;

  for (int end = 0; end < at.size(); end++) {
    count[x[at[end]]]++;
    while (count[x[at[start]]] > 1)
      count[x[at[start++]]]--;
    if (count.size() == y.size() && ret > at[end] - at[start] + 1)
      ret = at[end] - at[start] + 1;
  }
  return ret;
}

Edit: Here's an implementation of Jack's idea. It's the same time complexity as mine, but without the inner loop that confuses you.

int min_width(const string& x, const set<char>& y) {
  int ret = x.size();
  map<char, int> index;
  set<int> index_set;

  for (int j = 0; j < x.size(); j++) {
    if (y.count(x[j]) > 0) {
      if (index.count(x[j]) > 0)
        index_set.erase(index[x[j]]);
      index_set.insert(j);
      index[x[j]] = j;
      if (index.size() == y.size()) {
        int i = *index_set.begin();
        if (ret > j-i+1)
          ret = j-i+1;
      }
    }
  }
  return ret;
}

In Java it can be implemented nicely with LinkedHashMap:

static int minWidth(String x, HashSet<Character> y) {
    int ret = x.length();
    Map<Character, Integer> index = new LinkedHashMap<Character, Integer>();

    for (int j = 0; j < x.length(); j++) {
        char ch = x.charAt(j);
        if (y.contains(ch)) {
            index.remove(ch);
            index.put(ch, j);
            if (index.size() == y.size()) {
                int i = index.values().iterator().next();
                if (ret > j - i + 1)
                    ret = j - i + 1;
            }
        }
    }
    return ret;
}

All operations inside the loop take constant time (assuming hashed elements disperse properly).

Sheldon L. Cooper
The time complexity is linear in the size of the string "x" (assuming a constant size alphabet.)This implementation assumes there's always a solution.
Sheldon L. Cooper
@Sheldon, I tested your code and it does produce correct result, however you have a while loop inside a for loop, so the complexity looks to be O(n^2) to me.
grokus
No, it's O(n). The while loop executes at most n iterations OVERALL, since "start" is incremented at each iteration.
Sheldon L. Cooper
In other words, you're claiming that the code works but the while loop iterates more than n times overall. Which implies that the variable "start" would have a value greater than n. Then, the vector "at" is accessed outside its boundaries. Hence, the code is broken.Contradiction, that comes from assuming that the while loop iterates more than n times overall.
Sheldon L. Cooper
Only looking at the first loop, this code is already O(n log m).
Nabb
> The time complexity is linear in the size of the string "x" (**assuming a constant size alphabet.**)
Sheldon L. Cooper
You can always implement "count" as an hash map or an array, if you don't want to assume that.
Sheldon L. Cooper
+1 for you, you should use tr1::unordered_map instead of map or set.
grokus