views:

6219

answers:

12

I have an ArrayList of objects in Java. The objects have four fields, two of which I'd use to consider the object equal to another. I'm looking for the most efficient way, given those two fields, to see if the array contains that object.

The wrench is that these classes are generated based on XSD objects, so I can't modify the classes themselves to overwrite the .equals.

Is there any better way than just looping through and manually comparing the two fields for each object and then breaking when found? That just seems so messy, looking for a better way.

Edit: the ArrayList comes from a SOAP response that is unmarshalled into objects.

+2  A: 

If the list is sorted, you can use a binary search. If not, then there is no better way.

If you're doing this a lot, it would almost certainly be worth your while to sort the list the first time. Since you can't modify the classes, you would have to use a Comparator to do the sorting and searching.

Michael Myers
This isn't likely to be any quicker than a manual search as it doesn't sound as if his collection is sorted
oxbow_lakes
Tragically it's sorted by one of the two fields I don't care about. I could use a custom comparator to sort based on the one field that would help in the case of a binary search, but I have a feeling that wouldn't help much in terms of overall speed :|
Parrots
@Parrots: Is it possible to sort it once and then do all the searches? If so, and if you have a fair number of objects (say 50) in the list, a binary search is definitely going to be faster.
Michael Myers
On a totally unrelated subject, I wish the HTML sanitizer didn't use a greedy regex. That first link is supposed to be two different links, but the </a> and <a> in the middle got swallowed.
Michael Myers
Binary search will definitely be way faster than a normal linear search. This is assuming you get the entire list and only have to sort it once, otherwise you lose the speed advantage gained by using binary search.With 10,000 elements binary search = 14 comparisons, without = 10000 comparisons
MahlerFive
What if the underlaying List impleentation wasn't an arraylist but some sort of hashSet? That would be faster isn't?
OscarRyz
@Bombe re. edit: That's what I tried first, but it didn't show up correctly in the preview. The second link still doesn't work, but at least it's not ugly anymore.
Michael Myers
@Oscar: I'm not sure I understand your comment. How can a list be a HashSet?
Michael Myers
+2  A: 

Even if the equals method were comparing those two fields, then logically, it would be just the same code as you doing it manually. OK, it might be "messy", but it's still the correct answer

oxbow_lakes
+3  A: 

Given your constraints, you're stuck with brute force search (or creating an index if the search will be repeated). Can you elaborate any on how the ArrayList is generated--perhaps there is some wiggle room there.

If all you're looking for is prettier code, consider using the Apache Commons Collections classes, in particular CollectionUtils.find(), for ready-made syntactic sugar:

ArrayList haystack = // ...
final Object needleField1 = // ...
final Object needleField2 = // ...

Object found = CollectionUtils.find(haystack, new Predicate() {
   public boolean evaluate(Object input) {
      return needleField1.equals(input.field1) && 
             needleField2.equals(input.field2);
   }
});
Michael Brewer-Davis
+1  A: 

Building a HashMap of these objects based on the field value as a key could be worthwhile from the performance perspective, e.g. populate Maps once and find objects very efficiently

Rocket Surgeon
Only if searched multiple times.
cletus
+1  A: 

If you need to search many time in the same list, it may pay off to build an index.

Iterate once through, and build a HashMap with the equals value you are looking for as the key and the appropriate node as the value. If you need all instead of anyone of a given equals value, then let the map have a value type of list and build the whole list in the initial iteration.

Please note that you should measure before doing this as the overhead of building the index may overshadow just traversing until the expected node is found.

Thorbjørn Ravn Andersen
+13  A: 

It depends on how efficient you need things to be. Simply iterating over the list looking for the element which satisfies a certain condition is O(n), but so is ArrayList.Contains if you could implement the Equals method. If you're not doing this in loops or inner loops this approach is probably just fine.

If you really need very efficient look-up speeds at all cost, you'll need to do two things:

  1. Work around the fact that the class is generated: Write an adapter class which can wrap the generated class and which implement equals() based on those two fields (assuming they are public). Don't forget to also implement hashCode() (*)
  2. Wrap each object with that adapter and put it in a HashSet. HashSet.contains() has constant access time, i.e. O(1) instead of O(n).

Of course, building this HashSet still has a O(n) cost. You are only going to gain anything if the cost of building the HashSet is negligible compared to the total cost of all the contains() checks that you need to do. Trying to build a list without duplicates is such a case.


(*) Implementing hashCode() is best done by XOR'ing (^ operator) the hashCodes of the same fields you are using for the equals implementation (but multiply by 31 to reduce the chance of the XOR yielding 0)

Wim Coenen
+1. Excellent answer.
Jared
"HashSet.contains() has constant access time, i.e. O(1)" -- could you please point to a proof? Doesn't it depend *heavily* on the hash function? If not, why not just say "Fast in practice"? Otherwise, I think you're spreading misinformation (probably with the best intentions, though :))
Jonas Kölker
@Jonas Kölker: From the documentation: "This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets."
Wim Coenen
@Jonas, while a poor hashCode() implementation will lead to slow access times, any algorithms text (notably the CLR(S) text which many of the Collections data structures are built off - http://www.amazon.com/Introduction-Algorithms-Third-Thomas-Cormen/dp/0262033844/) will tell you that hash based data structures are O(1) for lookup. It is important to realize that O(1) does not denote one-step lookup, but lookup unrelated to the size of the data structure. Therefore even with poor hashCode()s, the lookup time is O(1). Wim isn't spreading any misinformation, in fact he's spot on.
dimo414
+6  A: 

You could use a Comparator with Java's built-in methods for sorting and binary search. Suppose you have a class like this, where a and b are the fields you want to use for sorting:

class Thing { String a, b, c, d; }

You would define your Comparator:

Comparator<Thing> comparator = new Comparator<Thing>() {
  public int compare(Thing o1, Thing o2) {
    if (o1.a.equals(o2.a)) {
      return o1.b.compareTo(o2.b);
    }
    return o1.a.compareTo(o2.a);
  }
};

Then sort your list:

Collections.sort(list, comparator);

And finally do the binary search:

int i = Collections.binarySearch(list, thingToFind, comparator);
Fabian Steeg
This is the path of least resistance. A HashSet takes time that is hard to analyze. This solution is equiv to the STL set
Overflown
Why would a HashSet be harder to analyze? You know the asymptotic running time. You can profile it. What is less analyzable about it?
Wim Coenen
Another good answer. I would be inclined to do this before constructing a wrapper class. Especially if you're looking at very large data sets, I suspect this might be more efficient (it certainly is space-wise).
dimo414
A: 

There are three basic options:

1) If retrieval performance is paramount and it is practical to do so, use a form of hash table built once (and altered as/if the List changes).

2) If the List is conveniently sorted or it is practical to sort it and O(log n) retrieval is sufficient, sort and search.

3) If O(n) retrieval is fast enough or if it is impractical to manipulate/maintain the data structure or an alternate, iterate over the List.

Before writing code more complex than a simple iteration over the List, it is worth thinking through some questions.

  • Why is something different needed? (Time) performance? Elegance? Maintainability? Reuse? All of these are okay reasons, apart or together, but they influence the solution.

  • How much control do you have over the data structure in question? Can you influence how it is built? Managed later?

  • What is the life cycle of the data structure (and underlying objects)? Is it built up all at once and never changed, or highly dynamic? Can your code monitor (or even alter) its life cycle?

  • Are there other important constraints, such as memory footprint? Does information about duplicates matter? Etc.

Jeremy Rishel
+1  A: 

Is there any better way than just looping through and manually comparing the two fields for each object and then breaking when found? That just seems so messy, looking for a better way.

If your concern is maintainability you could do what Fabian Steeg suggest ( that's what I would do ) although it probably isn't the "most efficient" ( because you have to sort the array first and then perform the binary search ) but certainly the cleanest and better option.

If you're really concerned with efficiency, you can create a custom List implementation that uses the field in your object as the hash and use a HashMap as storage. But probably this would be too much.

Then you have to change the place where you fill the data from ArrayList to YourCustomList.

Like:

 List list = new ArrayList();

 fillFromSoap( list );

To:

 List list = new MyCustomSpecialList();

 fillFromSoap( list );

The implementation would be something like the following:

class MyCustomSpecialList extends AbstractList  { 
    private Map<Integer, YourObject> internalMap;

    public boolean add( YourObject o ) { 
         internalMap.put( o.getThatFieldYouKnow(), o );
    }

    public boolean contains( YourObject o ) { 
        return internalMap.containsKey( o.getThatFieldYouKnow() );
    }

}

Pretty much like a HashSet, the problem here is the HashSet relies on the good implementation of the hashCode method, which probably you don't have. Instead you use as the hash "that field you know" which is the one that makes one object equals to the other.

Of course implementing a List from the scratch lot more tricky than my snippet above, that's why I say the Fabian Steeg suggestion would be better and easier to implement ( although something like this would be more efficient )

Tell us what you did at the end.

OscarRyz
A: 

I would say the simplest solution would be to wrap the object and delegate the contains call to a collection of the wrapped class. This is similar to the comparator but doesn't force you to sort the resulting collection, you can simply use ArrayList.contains().

public class Widget {
        private String name;
        private String desc;

        public String getName() {
         return name;
        }

        public void setName(String name) {
         this.name = name;
        }

        public String getDesc() {
         return desc;
        }

        public void setDesc(String desc) {
         this.desc = desc;
        }
    }



    public abstract class EqualsHashcodeEnforcer<T> {

        protected T wrapped;

        public T getWrappedObject() {
         return wrapped;
        }

        @Override
        public boolean equals(Object obj) {
         return equalsDelegate(obj);
        }

        @Override
        public int hashCode() {
         return hashCodeDelegate();
        }

        protected abstract boolean equalsDelegate(Object obj);

        protected abstract int hashCodeDelegate();
    }


    public class WrappedWidget extends EqualsHashcodeEnforcer<Widget> {

        @Override
        protected boolean equalsDelegate(Object obj) {
         if (obj == null) {
          return false;
         }
         if (obj == getWrappedObject()) {
          return true;
         }
         if (obj.getClass() != getWrappedObject().getClass()) {
          return false;
         }
         Widget rhs = (Widget) obj;

         return new EqualsBuilder().append(getWrappedObject().getName(),
           rhs.getName()).append(getWrappedObject().getDesc(),
           rhs.getDesc()).isEquals();
        }

        @Override
        protected int hashCodeDelegate() {

         return new HashCodeBuilder(121, 991).append(
           getWrappedObject().getName()).append(
           getWrappedObject().getDesc()).toHashCode();
        }

    }
jonathan.cone
A: 

Maybe a List isn't what you need.

Maybe a TreeSet would be a better container. You get O(log N) insertion and retrieval, and ordered iteration (but won't allow duplicates).

LinkedHashMap might be even better for your use case, check that out too.

Ben Hardy
+1  A: 

If you are a user of my ForEach DSL, it can be done with a Detect query.

Foo foo = ...
Detect<Foo> query = Detect.from(list);
for (Detect<Foo> each: query) 
    each.yield = each.element.a == foo.a && each.element.b == foo.b;
return query.result();
Adrian