Example :
S1 : abcde S2: cdef
Answer : cde
Example :
S1 : abcde S2: cdef
Answer : cde
Sort both of the strings into the same order, and then linearly scan through both sequences at the same time comparing each item, if it isn't the same, increment the sequence with the lower value character. It should be O(N log N) at a guess.
Off the top of my head I'd split both strings into a character array and sort them. I'd then go through one array and for each character check if its in the other. Having both arrays sorted means that you only need to loop through each array once when getting your result. I'm not sure if the overhead of sorting makes it any more efficient though.
Things that I have considered might be relevant are:
1) duplicate characters in a string - dismissed because off the top of my head I couldn't think of a nice efficient way of filtering out duplicates.
2) Different cases might want to be considered equal (eg A and a are considered the same) - not asked for but a toupper or tolower will solve that.
In ruby, algorithm O(len(s1) + len(s2)) using a hash table
require 'jcode'
def toto(s1, s2)
contains={}
s2.each_char { |x| contains[x]=1 }
res=""
s1.each_char do |x|
if contains[x]==1
res<<=x
end
end
return res
end
puts toto("abcde","cdef")
Use a hash map where the keys are the characters of the first string and the values are an int starting at zero. Then just look up each character from S2 and if the lookup succeeds then increment the value. Setting up the hash map is O(n) and each lookup is O(1). Worst case performance is about O(2n) which equivalent to O(n).
Just for reference, you could you the Boost unordered_map container in the real world.
Use some sort of set data structure: Fill the set with each character of S1. Then check for each character in S2 if it is in the set.
Depending on the implementation, both insertion and lookup on the set do only cost O(1).
var commonChars = S1.Intersect(S2);
EDIT: This is from System.Linq
- and it is an algorithm - it's a single, one-line algorithm. :-)
Another method suggests itself to me now... In pseudo code since its easier to explain:
for each (character in stringA)
{
charHashTable[character] = 1
}
for each (character in stringB)
{
if (charHashTable[character]==1)
if (!OutputArray.Contains(character))
OutputArray.Add(character)
}
I'm not 100% sure of the performance characteristics of putting stuff into hash arrays but it occured to me that this might be a better way to efficiently find if a character is in one array than sorting it and doing it that way. If anybody can comment on the performance benefits of this method over the sorting both strings one I'd be very interested.
If you know the number of unique characters that the strings are made up of (e.g.: If using only lower case characters then a-z means 26 unique symbols) then have an array of that size using the symbols as the index. Iterate over each string and for each of it's characters increment the corresponding array location (say you find b then a[1]++). In the end count the number of array positions which have a value of 2. Of course this solution has more space complexity.
Edit: Count the number of array positions that have 2 or more as the value. (Thanks to Chris for pointing out the flaw in the earlier answer)
It's reasonable to assume the set of characters is small and finite compared to the potential string length. So, handling 8-bit characters for example (in roughly-C++ pseudo-code):
bool in_a[256] = { false };
bool in_b[256] = { false };
for (int i = 0; i < a.size(); ++i)
in_a[a[i]] = true;
for (int i = 0; i < b.size(); ++i)
in_b[b[i]] = true;
// and in_a and in_b
for (int i = 0; i < b.size(); ++i)
if (in_a[i] && in_b[i])
{
std::cout << i;
if (isprint(i)) std::cout << '\'' << (char)i << '\'';
std::cout << ' ';
}
std::cout << '\n';
Note that using a hash table rather than an array is a huge waste of time (unless handling say a 32-bit character representation).
An implementation of the simplification suggested in Yordan's comment follows. This avoids the final loop from 0..255 and needs only one tracking array. The order of results is unsorted.
#include <vector>
#include <iostream>
int main()
{
std::string a = "abdcdefg";
std::string b = "byfdz";
bool track[256] = { false };
for (int i = 0; i < a.size(); ++i)
track[a[i]] = true;
for (int i = 0; i < b.size(); ++i)
if (track[b[i]])
{
track[b[i]] = false; // don't match again
std::cout << (int)b[i];
if (isprint(b[i])) std::cout << " \'" << (char)(b[i]) << "\'";
std::cout << ' ';
}
std::cout << '\n';
}