views:

353

answers:

8

Suppose I have:

  1. Toby
  2. Tiny
  3. Tory
  4. Tily

Is there an algorithm that can easily create a list of common characters in the same positions in all these strings? (in this case the common characters are 'T' at position 0 and 'y' at position 3)

I tried looking at some of the algorithms used for DNA sequence matching but it seems most of them are just used for finding common substrings regardless of their positions.

+3  A: 

Finding a list of characters that are common in ALL strings at a certain position is trivially simple. Just iterate on each string for each character position 1 character position at a time. If any string's character is not the match of it's closest neighbor string's character, then the position does not contain a common character.

For any i = 0 to length -1... Once you find Si[x] != Si+1[x] you can skip to the next position x+1.

Where Si is the ith string in the list. And [x] is the character at position x.

Brian R. Bondy
+1  A: 

Some generic code that has pretty poor performance O(n^2)

str[] = { "Toby", "Tiny", "Tory", "Tily" };
result = null;
largestString = str.getLargestString(); // Made up function
str.remove(largestString)
for (i = 0; i < largestString.length; i++) {
   hits = 0;
   foreach (str as value) {
      if (i < value.length) {
         if (value.charAt(i) == largestString.charAt(i))
            hits++;
      }
   }
   if (hits == str.length)
      result += largestString.charAt(i);
}
print(str.items);
Josh Smeaton
+1  A: 

I can't think of anything especially optimized.

You can do something like this, which shouldn't be too hard:

  //c# -- assuming your strings are in a List<string> named Names
  int shortestLength = Names[0].Length, j;
  char[] CommonCharacters;
  char single;

  for (int i = 1; i < Names.Count; i++)
  {
   if (Names[i].Length < shortestLength) shortestLength = Names[i].Length;
  }

  CommonCharacters = new char[shortestLength];
  for (int i = 0; i < shortestLength; i++)
  {
   j = 1;
   single = Names[0][i];
   CommonCharacters[i] = single;
   while (j < shortestLength)
   {
    if (single != Names[j][i])
    {
     CommonCharacters[i] = " "[0];
     break;
    }
    j++;
   }
  }

This would give you an array of characters that are the same across everything in the list.

theo
+1  A: 

What about something like this?

strings = %w(Tony Tiny Tory Tily)
positions = Hash.new { |h,k| h[k] = Hash.new { |h,k| h[k] = 0 } }
strings.each { |str| 
  0.upto(str.length-1) { |i| 
    positions[i][str[i,1]]+=1 
  }
}

At the end of execution, the result will be:

positions = {
  0=>{"T"=>4},
  1=>{"o"=>2, "i"=>2}, 
  2=>{"l"=>1, "n"=>2, "r"=>1},
  3=>{"y"=>4}
}
hoyhoy
And that's the reason I love the sound of ruby. Letting you get things done quick.
Josh Smeaton
+1  A: 

Here's an algorithm in 5 lines of ruby:

#!/usr/bin/env ruby
chars = STDIN.gets.chomp.split("")
STDIN.each do |string|
  chars = string.chomp.split("").zip(chars).map {|x,y| x == y ? x : nil }
end
chars.each_index {|i| puts "#{chars[i]}  #{i}" if chars[i] }

Put this in commonletters.rb. Sample usage:

$ commonletters.rb < input.txt
T  0
y  3

Assuming that input.txt contains:

Toby
Tiny
Tory
Tily

This should work with whatever inputs you throw at it. It will break if the input file is empty, but you can probably fix that yourself. This is O(n) (n is total number of chars in the input).

tgamblin
+1  A: 

And here's a trivial version in Python:

items = ['Toby', 'Tiny', 'Tory', 'Tily']
tuples = sorted(x for item in items for x in enumerate(item))
print [x[0] for x in itertools.groupby(tuples) if len(list(x[1])) == len(items)]

Which prints:

[(0, 'T'), (3, 'y')]

Edit: Here's a better version that doesn't require creating a (potentially) huge list of tuples:

items = ['Toby', 'Tiny', 'Tory', 'Tily']
minlen = min(len(x) for x in items)
print [(i, items[0][i]) for i in range(minlen) if all(x[i] == items[0][i] for x in items)]
Nick Johnson
A: 

In lisp:

CL-USER> (defun common-chars (&rest strings)
           (apply #'map 'list #'char= strings))
COMMON-CHARS

Just pass in the strings:

CL-USER> (common-chars "Toby" "Tiny" "Tory" "Tily")
(T NIL NIL T)

If you want the characters themselves:

CL-USER> (defun common-chars2 (&rest strings)
           (apply #'map
                  'list
                  #'(lambda (&rest chars)
                      (when (apply #'char= chars)
                        (first chars))) ; return the char instead of T
                  strings))
COMMON-CHARS2

CL-USER> (common-chars2 "Toby" "Tiny" "Tory" "Tily")
(#\T NIL NIL #\y)

If you don't care about posiitons, and just want a list of the common characters:

CL-USER> (format t "~{~@[~A ~]~}" (common-chars2 "Toby" "Tiny" "Tory" "Tily"))
T y 
NIL

I admit this wasn't an algorithm... just a way to do it in lisp using existing functionality

If you wanted to do it manually, as has been said, you loop comparing all the characters at a given index to each other. If they all match, save the matching character.

Morikal
+1  A: 
#include <iostream>

int main(void)
{
    char words[4][5] = 
    {
        "Toby",
        "Tiny",
        "Tory",
        "Tily"
    };

    int wordsCount = 4;
    int lettersPerWord = 4;

    int z;
    for (z = 1; z < wordsCount; z++)
    {
        int y;
        for (y = 0; y < lettersPerWord; y++)
        {
            if (words[0][y] != words[z][y])
            {
                words[0][y] = ' ';
            }
        }
    }

    std::cout << words[0] << std::endl;

    return 0;
}
EvilTeach