tags:

views:

673

answers:

6

Algo: (considering language used will be c#)

  1. Convert both strings into char array
  2. take the smaller array and generate a hash table for it with key as the character and value 0
  3. Now Loop through the other array and increment the count in hash table if that char is present in it.
  4. Now take out all char for hash table whose value is > 0.
  5. These are intersection values.

This is an O(n), solution but is uses extra space, 2 char arrays and a hash table

Can you guys think of better solution than this?

+1  A: 

You don't need to 2 char arrays. The System.String data type has a built-in indexer by position that returns the char from that position, so you could just loop through from 0 to (String.Length - 1). If you're more interested in speed than optimizing storage space, then you could make a HashSet for the one of the strings, then make a second HashSet which will contain your final result. Then you iterate through the second string, testing each char against the first HashSet, and if it exists then add it the second HashSet. By the end, you already have a single HashSet with all the intersections, and save yourself the pass of running through the Hashtable looking for ones with a non-zero value.

EDIT: I entered this before all the comments on the question about not wanting to use any built-in containers at all

scwagner
sounds good..but we may still want to iterate through the hashset to get those chars and convert it into string.
Learner
+2  A: 

Haven't tested this, but here's my thought:

  1. Quicksort both strings in place, so you have an ordered sequence of characters
  2. Keeping an index into both strings, compare the "next" character from each string, pick and output the first one, incrementing the index for that string.
  3. Continue until you get to the end of one of the strings, then just pull unique values from the rest of the remaining string.

Won't use additional memory, only needs the two original strings, two integers, and an output string (or StringBuilder). As an added bonus, the output values will be sorted too!

Part 2: This is what I'd write (sorry about the comments, new to stackoverflow):

private static string intersect(string left, string right)
{
  StringBuilder theResult = new StringBuilder();

  string sortedLeft = Program.sort(left);
  string sortedRight = Program.sort(right);

  int leftIndex = 0;
  int rightIndex = 0;

  //  Work though the string with the "first last character".
  if (sortedLeft[sortedLeft.Length - 1] > sortedRight[sortedRight.Length - 1])
  {
    string temp = sortedLeft;
    sortedLeft = sortedRight;
    sortedRight = temp;
  }

  char lastChar = default(char);
  while (leftIndex < sortedLeft.Length)
  {
    char nextChar = (sortedLeft[leftIndex] <= sortedRight[rightIndex]) ? sortedLeft[leftIndex++] : sortedRight[rightIndex++];

    if (lastChar == nextChar) continue;

    theResult.Append(nextChar);
    lastChar = nextChar;
  }

  //  Add the remaining characters from the "right" string
  while (rightIndex < sortedRight.Length)
  {
    char nextChar = sortedRight[rightIndex++];
    if (lastChar == nextChar) continue;

    theResult.Append(nextChar);
    lastChar = nextChar;
  }
  theResult.Append(sortedRight, rightIndex, sortedRight.Length - rightIndex);

  return (theResult.ToString());
}

I hope that makes more sense.

Phil
You could do better than quicksort, knowing that the data is characters, at least for extremely large data sets. If you are dealing with really hugestrings, you could do a library sort, and get O(n) performance, by basically creating a 255 char array.
Jeremybub
I didn't got "Keeping an index into both strings, compare the "next" character from each string, pick and output the first one, incrementing the index for that string."Can you shed some more light
Learner
I meant something like this: char lastChar = default(char); while (leftIndex < sortedLeft.Length) { char nextChar = (rightIndex >= sortedRight.Length) || (sortedLeft[leftIndex] <= sortedRight[rightIndex]) ? sortedLeft[leftIndex++] : sortedRight[rightIndex++]; if (lastChar == nextChar) continue; theResult.Append(nextChar); lastChar = nextChar; }Hope that's clear. I've actually written it now, works OK EXCEPT I discovered that an in-place sort isn't possible with a .NET String - bummer.
Phil
Hmm. Can't read that. I'll post it again as an answer.
Phil
unknown: "by basically creating a 255 char array..."AFAIK, C# strings are UTF16, so a counting sort is not that easy to implement. With byte-sized chars, a bool[255] (or even 8 ints) would be a useful solution to the original problem, with minimal overhead.
fishlips
+5  A: 

How about this ...

var s1 = "aabbccccddd";
var s2 = "aabc";

var ans = s1.Intersect(s2);
JP Alioto
We have a winner!
Jeremybub
While that's the easy way, the OP is asking for an algorithm to do it directly.
Ahmad Mageed
@Ahmad: if you want the code, just use Reflector to see how Intersect is implemented. :)
JP Alioto
@JP: true, this answer and Reflector both crossed my mind initially :)
Ahmad Mageed
Or just look it up: "When the object returned by this method is enumerated, Intersect enumerates first, collecting all distinct elements of that sequence. It then enumerates second, marking those elements that occur in both sequences. Finally, the marked elements are yielded in the order in which they were collected." - http://msdn.microsoft.com/en-us/library/bb460136.aspx
rmoore
The OP states that they'd prefer to do this "without using extra space". The Intersect method creates and populates a Set behind the scenes (ie, it uses extra space). Having said that, this is an improvement on the original algorithm: it uses a Set rather than a Hashtable, it gets rid of the char arrays, and it removes the final "cleanup" loop altogether.
LukeH
A: 

here's how I would do this. It's still O(N) and it doesn't use a hash table but instead one int array of length 26. (ideally)

  1. make an array of 26 integers, each element for a letter of the alphebet. init to 0's.
  2. iterate over the first string, decrementing one when a letter is encountered.
  3. iterate over the second string and take the absolute of whatever is at the index corresponding to any letter you encounter. (edit: thanks to scwagner in comments)
  4. return all letters corresponding to all indexes holding value greater than 0.

still O(N) and extra space of only 26 ints.

of course if you're not limited to only lower or uppercase characters your array size may need to change.

Victor
how will this work for s1 = "aahjdjdddl"s2 = "hhaahyyyyys"
Learner
continuing from above example, c[a] = -2 after first passthen during pass of s2, we will flip the sign for first occurrence of 'a', and we will not flip the sign for 2nd occurrence of 'a' as it is already +ve........
Learner
instead of "flipping the sign" in step 3, using Math.Abs would always keep the number positive, even in even-number-occurring repeating character counts for s2.
scwagner
Here instead of using array of ints we can use array of bools, with each index being the char ascii value, so we will require bool array of size 256.
Learner
It may not be an issue for the OP, but to cater for all possible unicode characters your array would need to be **a lot** bigger, which means that using a HashSet or similar would probably require less space overall.
LukeH
an array of bools wouldn't work here because you'd need one value for letters you've never seen, one value for one's you've seen in the first string, and another for the letters in both. so byte would probably be the most space efficient. but you'd have to make sure that when decrementing that you don't overflow.
Victor
A: 

"with each letter represented at most once"

I'm assuming that this means you just need to know the intersections, and not how many times they occurred. If that's so then you can trim down your algorithm by making use of yield. Instead of storing the count and continuing to iterate the second string looking for additional matches, you can yield the intersection right there and continue to the next possible match from the first string.

rmoore
A: 
Jaydeep