tags:

views:

91

answers:

4

I have two strings. How can I determine whether the first string is composed only of letters given by the second string?

For example:

A = abcd
B = abbcc

should return false, since d is not in the second string.

A = aab
B = ab

should return true.

If the program most of the time returns false, how can I optimize this program? If it returns true most of the time, how can I optimize it then?

+1  A: 

> If the program is always return false, how to optimize this program?

return false

> If it is always return true, how to optimize it?

return true

EDIT: Seriously, it's a good question what algorithm optimizes for the case of failure, and what algorithm optimizes for the case of success. I don't know what algorithm strstr uses, it may be a generally-good algorithm which isn't optimal for either of those assumptions.

Maybe you'll catch someone here who knows offhand. If not, this looks like a good place to start reading:Exact String Matching Algorithms

profjim
wow! beat me by 2 seconds.
Alexander Gessler
Perhaps this is a facetious answer, but I don't think this is really what @skydoor was asking. The example and question title indicates pretty clearly that he wants to determine if all of a particular string's characters appear in another string.
John Feminella
+3  A: 

[How do I] determine [if] the first string [is composed of characters that appear in] the second string?

Here's a quick algorithm:

  1. Treat the first and second strings as two sets of characters S and T.
  2. Perform the set difference S - T. Call the result U.

If U is nonempty, return false. Otherwise, return true.

John Feminella
@skydoor, you should clarify whether order is important: do the chars of B have to appear in A in the same sequence they appear in B? Or just anywhere.
profjim
@profjim: Yes, that's a good point. If order is important, you can still do a "set" difference, but now you have to check the elements of the set in a specific order.
John Feminella
You have essentially reworded the question without answering it
BlueRaja - Danny Pflughoeft
+5  A: 

Sort both strings. Then go through A, and have a pointer going through B. If the character in A is the same as what the B pointer points to, keep looking through A. If the character in A is later in the alphabet than what the B pointer points to, advance the B pointer. If the character in A is earlier in the alphabet than what the B pointer points to, return false. If you run out of A, return true.

David Thornley
+2  A: 

Here is one simple way.

def isComposedOf(A, B):
    bset = set(B)
    for c in A:
        if c not in bset:
            return False
    return True

This algorithm walks each string once, so it runs in O(len(A) + len(B)) time.

When the answer is yes, you cannot do better than len(A) comparisons even in the best case, because no matter what you must check every letter. And in the worst case one of the characters in A is hidden very deep in B. So O(len(A) + len(B)) is optimal, as far as worst-case performance is concerned.

Similarly: when the answer is no, you cannot do better than len(B) comparisons even in the best case; and in the worst case the character that isn't in B is hidden very deep in A. So O(len(A) + len(B)) is again optimal.

You can reduce the constant factor by using a better data structure for bset.

You can avoid scanning all of B in some (non-worst) cases where the answer is yes by building it lazily, scanning more of B each time you find a character in A that you haven't seen before.

Jason Orendorff
Depending on the implementation of set (is 'in' O(n)?) you may want to delete the character from bset after each success.
Andrew Jaffe
That would give the wrong answer for the second example in the question (A="aab", B="ab"). Anyway, `set` is a hash set, so queries are O(1) except in when there are extremely many hash collisions.
Jason Orendorff