ansaurus

Question

Finding what makes strings unique in a list, can you improve on brute force?

Answer 1

+9 A:

You can generate a two dimensional array which will contain the number of times each character appears in each position (0-3). For example, arr[1,3] will contain the number of times the digit/character 1 appears in the last position.

Then for each string s, go over all characters in the string. The ones which appear only once in that position according to the array are the unique characters for that string. In other words, if arr[s[i], i]==1 Then string s is unique in position i.

This will give you the solution in linear time, while the algorithm you gave will take quadratic time.

interjay 2010-05-18 17:57:42

From quadratic to linear is always best but I do wonder about the possibility of getting one more item that will simply invalidate all "unique" signatures. The set of strings we can deduce a signature for here is quite contrived (104 = 26*4) so I wonder if the algorithm should not provide for the necessity to use 2 positions / 3 positions etc... What's great about your solution is that it works still: `arr[(a,1)(b,3)]` could represent the number of times we've seen something matching `.a.b`... It would not really be linear though, as the number of combination varies in the space of strings.

Matthieu M. 2010-05-19 06:23:16

Answer 2

+1 A:

If your goal is to identify images later, you could create a very fast hash of the image by picking predefined points to serve as identity pixels.

for example, you could have a structure (class, struct, doesn't matter what language) as follows:

structure ImageHash {
    int x_pixels, y_pixels;
    u_long hash;
    void createHash(Image img) {
        x_pixels = img.x_pixels;
        y_pixels = img.y_pixels;
        for(int i = 1; i < 5; i++) {
            int x = x_pixels / i;
            for(int j = 1; j < 5; j++) {
                int y = y_pixels / j;
                int r = img.getPixelRed(x,y);
                int g = img.getPixelGreen(x,y);
                int b = img.getPixelBlue(x,y);
                hash = (hash * 31) ^ (r^g^b);
            }
        }
    }
}

This sort of "incomplete hash" will allow you identify possible identities, and then you can do the expensive, full comparison sparingly as required.

Expand the incomplete hash as necessary.

glowcoder 2010-05-18 18:03:30

+1 creative, although isn't it a catch-22 that I could only choose good predefined points by first identifying those points that are most likely to be unique?

Ed Guiness 2010-05-18 18:10:01

I just picked random points. I was going to have them evenly spaced using mod and stuff like that, and then I said meh, these points are valid and "random enough". =)

glowcoder 2010-05-18 18:13:57

Answer 3

A:

This problem can be solved by trie, or prefix tree.

See Trie - Wikipedia, the free encyclopedia

For the 3 strings in your example:

abcd
abcc
bbcb

will be turned into a trie tree (where ^ denotes the root of the tree):

^--a-b-c-d
 \      \
  \      c
   \
    b-b-c-b

The path to the node where it branch off are the common prefix. The node after the last branch point is what makes a particular string unique. In this case, they are d, c, b.

I assume the order of string is not important for you, that you compares all strings to find the uniqueness, not just the neighboring string.

The complexity should be O(n x m). But this will probably affected by the domain of the characters in your string.

Wai Yip Tung 2010-05-18 19:37:32

I think I might have misunderstand the question. It want to find the difference of first item from the last row, not from any row. In that case the trie algorithm does not apply.

Wai Yip Tung 2010-05-18 20:04:53

Could you expand this answer a little? I currently use Tries for symbol recognition elsewhere in this application but haven't considered how they might help identify images in general since I assumed it would be too slow to derive Tries for images in my future scenarios.

Ed Guiness 2010-05-18 21:39:39

I added an example to the answer because I cannot do formatted text in the comment.

Wai Yip Tung 2010-05-19 00:38:33

Thanks for expanding. I can see that this would help discover where the strings *diverge* (branch) but I still can't see how they would help identify all unique charactes and their positions since I would still need to compare the second character of the first branch to the second character of the second branch and so on. Look at my first example to see what I'm after. What am I missing?

Ed Guiness 2010-05-19 06:32:07

ansaurus

tags:

views:

answers:

Finding what makes strings unique in a list, can you improve on brute force?

related questions