+1  A: 

Sort both of the strings into the same order, and then linearly scan through both sequences at the same time comparing each item, if it isn't the same, increment the sequence with the lower value character. It should be O(N log N) at a guess.

Mr Samuel
sorting adds more overhead and complexity. Plus it might be an expensive operation if the string is long.
Sagar V
I think O(NlogN + MlogM) which is near enough the same thing especially if you define N > M (where N and M are the string lengths).
Chris
@Sagar V: possibly but if the other option is having to scan through the whole string each time you want to find a character in it then it migth be quicker in the long run...
Chris
+1  A: 

Off the top of my head I'd split both strings into a character array and sort them. I'd then go through one array and for each character check if its in the other. Having both arrays sorted means that you only need to loop through each array once when getting your result. I'm not sure if the overhead of sorting makes it any more efficient though.

Things that I have considered might be relevant are:

1) duplicate characters in a string - dismissed because off the top of my head I couldn't think of a nice efficient way of filtering out duplicates.

2) Different cases might want to be considered equal (eg A and a are considered the same) - not asked for but a toupper or tolower will solve that.

Chris
A: 

In ruby, algorithm O(len(s1) + len(s2)) using a hash table

require 'jcode'
def toto(s1, s2)
    contains={}
    s2.each_char { |x| contains[x]=1 }
    res=""
    s1.each_char do |x|
        if contains[x]==1
            res<<=x
        end
    end
    return res
end

puts toto("abcde","cdef")
yogsototh
A: 

Use a hash map where the keys are the characters of the first string and the values are an int starting at zero. Then just look up each character from S2 and if the lookup succeeds then increment the value. Setting up the hash map is O(n) and each lookup is O(1). Worst case performance is about O(2n) which equivalent to O(n).

Just for reference, you could you the Boost unordered_map container in the real world.

Robert S. Barnes
+4  A: 

Use some sort of set data structure: Fill the set with each character of S1. Then check for each character in S2 if it is in the set.

Depending on the implementation, both insertion and lookup on the set do only cost O(1).

Gumbo
Damn, you must have answered while I was doing my second answer. How can I not give you a +1 when you've said the same thing. :)
Chris
@Joey: Thanks for the remark although you have deleted your comment.
Gumbo
A: 
var commonChars = S1.Intersect(S2);

EDIT: This is from System.Linq - and it is an algorithm - it's a single, one-line algorithm. :-)

Enigmativity
That's not an algorithm, that's a library function. I doubt he'd get away with this in an interview :)
Tim Pietzcker
Language / platform information would be useful
Tx3
@`Tim Pietzcker` - I would hope at an interviewer would recognise that a developer knew the most efficient way to code the solution. Efficient for the developer, but not necessarily the code execution, as that is where the biggest cost for an employer is.
Enigmativity
Most efficient way to code the solution? Sure. Does it demonstrate his power for creative thinking? No. Hence the tag "interview-questions".If someone whips up their own intersect function when one exists, however, they deserve to be whipped themselves.
Christian Mann
A: 

Another method suggests itself to me now... In pseudo code since its easier to explain:

for each (character in stringA)
{
    charHashTable[character] = 1

}
for each (character in stringB)
{
    if (charHashTable[character]==1)
        if (!OutputArray.Contains(character))
            OutputArray.Add(character)
}

I'm not 100% sure of the performance characteristics of putting stuff into hash arrays but it occured to me that this might be a better way to efficiently find if a character is in one array than sorting it and doing it that way. If anybody can comment on the performance benefits of this method over the sorting both strings one I'd be very interested.

Chris
Sorting strings should be O(NlogN), whereas hash map inserts/lookups are O(1), so the hash maps are faster. Still, O() analysis only becomes meaningful for dominant (i.e. very large) values of N. If the keys are a small set of contiguous values, then direct array indexing is much faster and more space efficient than a hash lookup, which consists of generating a hash number from the key (some maths transformations, bit shifting, and/or XORing) + having a sparse table of values to keep collisions to a manageable level.
Tony
A: 

If you know the number of unique characters that the strings are made up of (e.g.: If using only lower case characters then a-z means 26 unique symbols) then have an array of that size using the symbols as the index. Iterate over each string and for each of it's characters increment the corresponding array location (say you find b then a[1]++). In the end count the number of array positions which have a value of 2. Of course this solution has more space complexity.

Edit: Count the number of array positions that have 2 or more as the value. (Thanks to Chris for pointing out the flaw in the earlier answer)

Sagar V
Wouldn't that fail in the case that string a was "aa" and string b was "bb" since both would have a value of 2 in your result array?
Chris
Oh Yes! Thanks for pointing that out. I will edit my answer.
Sagar V
I'm still not sure that it works in the example I gave before. Both of them would have values of 2 for a and b but neither of them are common characters...
Chris
+5  A: 

It's reasonable to assume the set of characters is small and finite compared to the potential string length. So, handling 8-bit characters for example (in roughly-C++ pseudo-code):

bool in_a[256] = { false };
bool in_b[256] = { false };
for (int i = 0; i < a.size(); ++i)
    in_a[a[i]] = true;
for (int i = 0; i < b.size(); ++i)
    in_b[b[i]] = true;
// and in_a and in_b
for (int i = 0; i < b.size(); ++i)
    if (in_a[i] && in_b[i])
    {
        std::cout << i;
        if (isprint(i)) std::cout << '\'' << (char)i << '\'';
        std::cout << ' ';
    }
std::cout << '\n';

Note that using a hash table rather than an array is a huge waste of time (unless handling say a 32-bit character representation).

An implementation of the simplification suggested in Yordan's comment follows. This avoids the final loop from 0..255 and needs only one tracking array. The order of results is unsorted.

#include <vector>
#include <iostream>

int main()
{
    std::string a = "abdcdefg";
    std::string b = "byfdz";
    bool track[256] = { false };
    for (int i = 0; i < a.size(); ++i)
        track[a[i]] = true;
    for (int i = 0; i < b.size(); ++i)
        if (track[b[i]])
        {
            track[b[i]] = false; // don't match again
            std::cout << (int)b[i];
            if (isprint(b[i])) std::cout << " \'" << (char)(b[i]) << "\'";
            std::cout << ' ';
        }
    std::cout << '\n';
}
Tony
+1: Good answer. For character strings, anything else is overkill. The `isprint` carefulness is appreciable. BTW: (1) for `in_a` and `in_b` you could use `std::bitset<256>`. (2) In the third `for` loop, why it is `i < b.size()`? Won't it be `i < in_a.size()`?
ArunSaha
Thanks. 1) bitset<256> is great for saving space: best when the lifetime outlives the function and there are lots of such sets, but bool[] will typically be faster (most CPUs will be slower doing lots of bit-shifting, ORs and ANDs, while bool will typically occupy an 'independent' byte), and as in_a and in_b are effectively temporaries on the stack - and not particularly big - I've optimised performance over space. 2) `i` is used to index into both in_a and in_b, so the choice was arbitrary. It would be slightly safer to "nail" one to the other ala `bool in_b[sizeof in_a]`, but confusing?
Tony
@Tony For just solving the problem, I do not think you need the second array ( in_b[] ). The loop over b[] can also check in in_a[] and print the result.
Yordan Pavlov
Yordan: good idea - implementation added to the answer. Thanks.
Tony