views:

77

answers:

4

I have two byte[] arrays in a method like this:

private static boolean containsBytes(byte[] body, byte[] checker){
  //Code you do not want to ever see here.
}

I want to, using the standard API as much as possible, determine if the series contained in the checker array exists anywhere in the body array.

Right now I'm looking at some nasty code that did a hand-crafted algorithm. The performance of the algorithm is OK, which is about all you can say for it. I'm wondering if there is a more standard api way to accomplish it. Otherwise, I know how to write a readable hand-crafted one.

To get a sense of scale here, the checker array would not be larger than 48 (probably less) and the body might be a few kb large at most.

+3  A: 

I don't know of anything in the standard API to help you here. There may be something in a third party library, although it would potentially need to be implemented repeatedly, once for each primitive type :(

EDIT: I was going to look for Boyer-Moore, but this answer was added on my phone, and I ran out of time :)

Depending on the data and your requirements, you may find that a brute force approach is absolutely fine - and a lot simpler to implement than any of the fancier algorithms available. The simple brute force approach is generally my first port of call - it often turns out to be perfectly adequate :)

Jon Skeet
Hand crafted it is :(.
Yishai
@Yishai: You may well want to check out Guava, as per ColinD's answer...
Jon Skeet
+3  A: 

You probably already know this, but what you're trying to (re-)implement is basically a string search:

http://en.wikipedia.org/wiki/String_searching_algorithm

The old code might in fact be an implementation of one of the string search algorithms; for better performance, it might be good to implement one of the other algorithms. You didn't mention how often this method is going to be called, which would help to decide whether it's worth doing that.

Richard Fearn
@Richard, thanks for the info. Unfortunately it was not an implementation of a standard algorithm. It was a poor implementation of brute force.
Yishai
+3  A: 

Not in the standard library (like Jon Skeet said, probably nothing there that does this) but Guava could help you here with its method Bytes.indexOf(byte[] array, byte[] target).

boolean contained = Bytes.indexOf(body, checker) != -1;

Plus, the same method exists in the classes for the other primitive types as well.

ColinD
@ColinD: I was wondering whether Guava would contain anything around this - but from a phone, I couldn't easily check. Good spot!
Jon Skeet
Thanks ColinD, it was very helpful.
Yishai
@Yishai: If you decide to use this, it would be appropriate to move the accepted answer tick to this answer :)
Jon Skeet
Sadly, Guava's implementation is still using the naive algorithm. We probably want to improve that, but I can't promise when.
Kevin Bourrillion
@Jon, The question was the standard API, and I'm not introducing Guava for this, a brute force loop (normally written) is fine for the job here. But I did upvote it ;).
Yishai
+1  A: 

The collections framework can both cheaply wrap an array in the List interface and search for a sublist. I think this would work reasonably well:

import java.util.Arrays;
import java.util.Collections;
boolean found = Collections.indexOfSubList(Arrays.asList(body), Arrays.asList(checker) >= 0;
David Winslow
I've just tried that with a sample and it didn't work. I'm not sure why off the top of my head...
Jon Skeet
Great idea, unfortunately Arrays.asList just produces an `List<byte[]>` rather than a `List<Byte>` in that case, making the method not work. Of course you could convert the byte[] to a `List<Byte>` via a loop, so it is an example of a standard API method. +1.
Yishai
because `Arrays.asList(byte[])` is a one-element list containing the byte array itself. You need something like Guava's `Bytes.asList()`. But `indexOfSubList` is just as brute-force as Guava's `Bytes.indexOf()` is anyway.
Kevin Bourrillion
varargs strikes again! I didn't realize there wasn't an overload for byte[], but thinking a little harder I guess it would have to box each byte and effectively copy the whole thing. If you weren't using primitives though...
David Winslow