I have an array of bytes and i want to determine if the contents of this array of bytes exists within another larger array as a continuous sequence. What is the simplest way to go about doing this?
+3
A:
The naive approach is:
public static bool IsSubsetOf(byte[] set, byte[] subset) {
for(int i = 0; i < set.Length && i + subset.Length <= set.Length; ++i)
if (set.Skip(i).Take(subset.Length).SequenceEqual(subset))
return true;
return false;
}
For more efficient approaches, you might consider more advanced string matching algorithms like KMP.
Mehrdad Afshari
2009-06-19 23:25:40
It's quite awfully inefficient...
Michał Chaniewski
2009-06-19 23:35:02
@Michal: It's O(n*m) which might be considered inefficient, but if you're worrying about Skip and Take, don't. This is as efficient as a couple for loops. As I said, if performance is an issue, you should consider a more advanced algorithm.
Mehrdad Afshari
2009-06-19 23:37:33
@Mehrdad - yeah, but m*n can get quite big for binary data... I know that Skip() and Take() is done by deferred execution - so it's just n component of O(m*n) coming from SequenceEquals...
Michał Chaniewski
2009-06-19 23:50:10
Performance will be a problem but for now i just want to get my code working. this ought to do nicely...
RCIX
2009-06-19 23:59:06
+3
A:
Try to adapt some string search algorithm. One of the fastest is Boyer-Moore . It's quite easy as well. For binary data, Knuth-Morris-Pratt algorithm might work very efficiently as well.
Michał Chaniewski
2009-06-19 23:29:31
A:
This, which is a 1/1 port of this answer: http://stackoverflow.com/questions/1507780/searching-for-a-sequence-of-bytes-in-a-binary-file-with-java
Is a very efficient way of doing so:
public static class KmpSearch {
public static int IndexOf(byte[] data, byte[] pattern) {
int[] failure = ComputeFailure(pattern);
int j = 0;
if (data.Length == 0) return -1;
for (int i = 0; i < data.Length; i++) {
while (j > 0 && pattern[j] != data[i]) {
j = failure[j - 1];
}
if (pattern[j] == data[i]) { j++; }
if (j == pattern.Length) {
return i - pattern.Length + 1;
}
}
return -1;
}
private static int[] ComputeFailure(byte[] pattern) {
int[] failure = new int[pattern.Length];
int j = 0;
for (int i = 1; i < pattern.Length; i++) {
while (j > 0 && pattern[j] != pattern[i]) {
j = failure[j - 1];
}
if (pattern[j] == pattern[i]) {
j++;
}
failure[i] = j;
}
return failure;
}
}
Sam Saffron
2010-03-23 03:03:34