ansaurus

Question

Implement an algorithm that takes two strings as input, and returns the intersection of the two, with each letter represented at most once. Now speed it up. Now test it.

Answer 1

+1 A:

You don't need to 2 char arrays. The System.String data type has a built-in indexer by position that returns the char from that position, so you could just loop through from 0 to (String.Length - 1). If you're more interested in speed than optimizing storage space, then you could make a HashSet for the one of the strings, then make a second HashSet which will contain your final result. Then you iterate through the second string, testing each char against the first HashSet, and if it exists then add it the second HashSet. By the end, you already have a single HashSet with all the intersections, and save yourself the pass of running through the Hashtable looking for ones with a non-zero value.

EDIT: I entered this before all the comments on the question about not wanting to use any built-in containers at all

scwagner 2009-07-07 05:22:00

sounds good..but we may still want to iterate through the hashset to get those chars and convert it into string.

Learner 2009-07-07 05:26:56

Answer 2

+2 A:

Haven't tested this, but here's my thought:

Quicksort both strings in place, so you have an ordered sequence of characters
Keeping an index into both strings, compare the "next" character from each string, pick and output the first one, incrementing the index for that string.
Continue until you get to the end of one of the strings, then just pull unique values from the rest of the remaining string.

Won't use additional memory, only needs the two original strings, two integers, and an output string (or StringBuilder). As an added bonus, the output values will be sorted too!

Part 2: This is what I'd write (sorry about the comments, new to stackoverflow):

private static string intersect(string left, string right)
{
  StringBuilder theResult = new StringBuilder();

  string sortedLeft = Program.sort(left);
  string sortedRight = Program.sort(right);

  int leftIndex = 0;
  int rightIndex = 0;

  //  Work though the string with the "first last character".
  if (sortedLeft[sortedLeft.Length - 1] > sortedRight[sortedRight.Length - 1])
  {
    string temp = sortedLeft;
    sortedLeft = sortedRight;
    sortedRight = temp;
  }

  char lastChar = default(char);
  while (leftIndex < sortedLeft.Length)
  {
    char nextChar = (sortedLeft[leftIndex] <= sortedRight[rightIndex]) ? sortedLeft[leftIndex++] : sortedRight[rightIndex++];

    if (lastChar == nextChar) continue;

    theResult.Append(nextChar);
    lastChar = nextChar;
  }

  //  Add the remaining characters from the "right" string
  while (rightIndex < sortedRight.Length)
  {
    char nextChar = sortedRight[rightIndex++];
    if (lastChar == nextChar) continue;

    theResult.Append(nextChar);
    lastChar = nextChar;
  }
  theResult.Append(sortedRight, rightIndex, sortedRight.Length - rightIndex);

  return (theResult.ToString());
}

I hope that makes more sense.

Phil 2009-07-07 05:22:44

You could do better than quicksort, knowing that the data is characters, at least for extremely large data sets. If you are dealing with really hugestrings, you could do a library sort, and get O(n) performance, by basically creating a 255 char array.

Jeremybub 2009-07-07 05:29:23

I didn't got "Keeping an index into both strings, compare the "next" character from each string, pick and output the first one, incrementing the index for that string."Can you shed some more light

Learner 2009-07-07 05:36:30

I meant something like this: char lastChar = default(char); while (leftIndex < sortedLeft.Length) { char nextChar = (rightIndex >= sortedRight.Length) || (sortedLeft[leftIndex] <= sortedRight[rightIndex]) ? sortedLeft[leftIndex++] : sortedRight[rightIndex++]; if (lastChar == nextChar) continue; theResult.Append(nextChar); lastChar = nextChar; }Hope that's clear. I've actually written it now, works OK EXCEPT I discovered that an in-place sort isn't possible with a .NET String - bummer.

Phil 2009-07-07 06:17:05

Hmm. Can't read that. I'll post it again as an answer.

Phil 2009-07-07 06:17:42

unknown: "by basically creating a 255 char array..."AFAIK, C# strings are UTF16, so a counting sort is not that easy to implement. With byte-sized chars, a bool[255] (or even 8 ints) would be a useful solution to the original problem, with minimal overhead.

fishlips 2009-07-07 06:55:09

Answer 3

+5 A:

How about this ...

var s1 = "aabbccccddd";
var s2 = "aabc";

var ans = s1.Intersect(s2);

JP Alioto 2009-07-07 05:28:15

We have a winner!

Jeremybub 2009-07-07 05:29:56

While that's the easy way, the OP is asking for an algorithm to do it directly.

Ahmad Mageed 2009-07-07 05:30:16

@Ahmad: if you want the code, just use Reflector to see how Intersect is implemented. :)

JP Alioto 2009-07-07 05:38:39

@JP: true, this answer and Reflector both crossed my mind initially :)

Ahmad Mageed 2009-07-07 05:40:51

Or just look it up: "When the object returned by this method is enumerated, Intersect enumerates first, collecting all distinct elements of that sequence. It then enumerates second, marking those elements that occur in both sequences. Finally, the marked elements are yielded in the order in which they were collected." - http://msdn.microsoft.com/en-us/library/bb460136.aspx

rmoore 2009-07-07 05:57:47

The OP states that they'd prefer to do this "without using extra space". The Intersect method creates and populates a Set behind the scenes (ie, it uses extra space). Having said that, this is an improvement on the original algorithm: it uses a Set rather than a Hashtable, it gets rid of the char arrays, and it removes the final "cleanup" loop altogether.

LukeH 2009-07-07 08:54:04

Answer 4

A:

here's how I would do this. It's still O(N) and it doesn't use a hash table but instead one int array of length 26. (ideally)

make an array of 26 integers, each element for a letter of the alphebet. init to 0's.
iterate over the first string, decrementing one when a letter is encountered.
iterate over the second string and take the absolute of whatever is at the index corresponding to any letter you encounter. (edit: thanks to scwagner in comments)
return all letters corresponding to all indexes holding value greater than 0.

still O(N) and extra space of only 26 ints.

of course if you're not limited to only lower or uppercase characters your array size may need to change.

Victor 2009-07-07 05:31:13

how will this work for s1 = "aahjdjdddl"s2 = "hhaahyyyyys"

Learner 2009-07-07 05:40:02

continuing from above example, c[a] = -2 after first passthen during pass of s2, we will flip the sign for first occurrence of 'a', and we will not flip the sign for 2nd occurrence of 'a' as it is already +ve........

Learner 2009-07-07 05:43:11

instead of "flipping the sign" in step 3, using Math.Abs would always keep the number positive, even in even-number-occurring repeating character counts for s2.

scwagner 2009-07-07 05:43:43

Here instead of using array of ints we can use array of bools, with each index being the char ascii value, so we will require bool array of size 256.

Learner 2009-07-07 08:15:28

It may not be an issue for the OP, but to cater for all possible unicode characters your array would need to be **a lot** bigger, which means that using a HashSet or similar would probably require less space overall.

LukeH 2009-07-07 09:13:52

an array of bools wouldn't work here because you'd need one value for letters you've never seen, one value for one's you've seen in the first string, and another for the letters in both. so byte would probably be the most space efficient. but you'd have to make sure that when decrementing that you don't overflow.

Victor 2009-07-07 15:15:37

Answer 5

A:

"with each letter represented at most once"

I'm assuming that this means you just need to know the intersections, and not how many times they occurred. If that's so then you can trim down your algorithm by making use of yield. Instead of storing the count and continuing to iterate the second string looking for additional matches, you can yield the intersection right there and continue to the next possible match from the first string.

rmoore 2009-07-07 05:49:04

Answer 6

A:

Jaydeep 2009-10-31 12:56:01

ansaurus

tags:

views:

answers:

Implement an algorithm that takes two strings as input, and returns the intersection of the two, with each letter represented at most once. Now speed it up. Now test it.

related questions