views:

93

answers:

9

Let's say I have-

String x = "ab";
String y = "xypa";

If I want to see if any subset of x exists in y, what would be the fastest way? Looping is time consuming. In the example above a subset of x is "a" which is found in y.

+1  A: 

Well, I'm not sure it's better than looping, but you could use String#matches:

if (y.matches(".*[" + x + "]+.*")) ...

You'd need to escape characters that are special in a regex [] construct, though (like ], -, \, ...).

The above is just an example, if you're doing it more than once, you'll want to use Pattern, Matcher, and the other stuff from the java.util.regex package.

T.J. Crowder
Sorry, error in the original regex (`matches` looks for a complete match for the entire string). Fixed.
T.J. Crowder
Doesn't deal with any sybset
Eugene Kuleshov
@Eugene: It deals with the case he listed. But if `x` is "abc" and he wants to match "ab", indeed, this is not the answer. Might be part of it, though.
T.J. Crowder
A: 

Looping is time-consuming, but there's no way to do what you want other than going over the target string repeatedly.

What you can do is optimize by checking the smallest strings first, and work your way up. For example, if the target string doesn't contain abc, it can't possibly contain abcdef.

Other optimizations off the top of my head:

  • Don't continue to check for a match after a non-matching character is hit, though in Java you can let the computer worry about this.
  • Don't check to see if something is a match if there aren't enough characters left in the target string for a match to be possible.
  • If you need speed and have lots of space, you might be able to break the target string up into a fancy data structure like a trie for better results, though I don't have an exact algorithm in mind.
  • Another storage-is-not-a-problem solution: decompose the target into every possible substring and store the results in a HashSet.
Lord Torgamus
A: 

You have to use for loop or use regex which is just as expensive as a for loop, becasue you need to convert one of your strings into chars basically.

Boolean isSubset = false;
for(int i = 0; i < x.length(); i++) {
        if(y.contains(x.charAt(i))) {
                isSubset = true;
                break;
        }
}

using a for loop.

Jacob Nelson
*"...which is just as expensive as a for loop..."* Using a regex A) At least leaves open the possibility the library implementation in a given runtime may be able to do better (e.g., native calls, whatever), and B) Leverages code that's already written. It may well be that it's just as slow (or slower!) than looping yourself. Or not.
T.J. Crowder
... All I was saying was if you do 1million for loops like this or 1million regex that the cost for the application to run would be about the same.
Jacob Nelson
A: 

It looks like this could be a case of the longest common substring problem.

Nick
+1  A: 

The answer really depends on many things.

If you just want to find any subset and you're doing this only once, looping is just fine (and the best you can do without using additional storage) and you can stop when you find a single character that matches.

If you have a fixed x and want to use it for matching several strings y, you can do some pre-processing to store the characters in x in a table and use this table to check if each character of y occurs in x or not.

If you want to find the largest subset, then you're looking at a different problem: the longest common subsequence problem.

casablanca
A: 

You can generate all subsets of x (e.g. , in your example, ab, a, b) and then generate a regexp that would do the

  Pattern p = Pattern.compile("(ab|a|b)");
  Matcher m = p.matcher(y);
  if(m.find()) {
    System.err.println(m.group());
  }
Eugene Kuleshov
A: 

If both Strings will only contain [a-z]. Then fastest would be to make two bitmaps, 26 bits longs. Mark all the bits contained in the String. Take the AND of the bitmaps, the resulting bits are present in both Strings, the largest common subset. This would be a simple O(n) with n the length of the biggest String.

(If you want to cover the whole lot of UTF, bloom filters might be more appropriate. )

Ishtar
A: 

SO I think this is a place where the power of a unique set could be handy.

If you put all the value of both strings into two different HashSets, you can then run a containsAll to determine if one is a subset of the other.

highphilosopher
-1, `containsAll()` only returns true if _all_ of the elements in one `Collection` are present in the other ( [API](http://download.oracle.com/javase/6/docs/api/java/util/AbstractCollection.html#containsAll%28java.util.Collection%29) ). The OP is looking for _any_ subset of `x`
Lord Torgamus
A: 

What about this:?

package so3935620;

import static org.junit.Assert.*;

import java.util.BitSet;

import org.junit.Test;

public class Main {

  public static boolean overlap(String s1, String s2) {
    BitSet bs = new BitSet();
    for (int i = 0; i < s1.length(); i++) {
      bs.set(s1.charAt(i));
    }
    for (int i = 0; i < s2.length(); i++) {
      if (bs.get(s2.charAt(i))) {
        return true;
      }
    }
    return false;
  }

  @Test
  public void test() {
    assertFalse(overlap("", ""));
    assertTrue(overlap("a", "a"));
    assertFalse(overlap("abcdefg", "ABCDEFG"));
  }
}

And if that version is too slow, you can compute the BitSet depending on s1, save that in some variable and later only loop over s2.

Roland Illig