views:

393

answers:

2

I'm thinking of asking my team, of mixed skill levels, to use Google Guava. Prior to Guava, I'd have used the Apache Collections (or its generified version).

Guava, as opposed to Apache Collections, seems to be stronger in some ways, but perhaps less easy to use for less experienced programmers. Here's one area where I think might exemplify that.

The code I've inherited contains a lot of looping over lists of what are essentially maps of heterogeneous values, probing them for values, doing null checks, and then doing something trivial:

boolean foo( final List< MapLike > stuff, final String target ) {
  final String upperCaseTarget = target.toUpperCase(0;

  for( MapLike m : stuff ) {
     final Maplike n = (MapLike) m.get( "hard coded string" );
     if( n != null ) {
         final String s = n.get( "another hard code string" );
         if( s != null && s.toUpperCase().equals( upperCaseTarget ) ) {
            return true ;
         }
     }
   return false ;
}

My initial thought was to use Apache Collections Transformers:

boolean foo( final List< MapLike > stuff, final String target ) {
   Collection< String> sa = (Collection< String >) CollectionUtils.collect( stuff, 
     TransformerUtils.chainedTransformer( new Transformer[] { 
        AppUtils.propertyTransformer("hard coded string"),
        AppUtils.propertyTransformer("another hard coded string"),
        AppUtils.upperCaseTransformer()
         } ) );

    return sa.contains( target.toUpperCase() ) ;        

}

Using Guava, I might go two ways:

boolean foo( final List< MapLike > stuff, final String target ) {
   Collection< String > sa = Collections2.transform( stuff,
       Functions.compose( AppUtils.upperCaseFunction(), 
       Functions.compose( AppUtils.propertyFunction("another hard coded string"), 
                          AppUtils.propertyFunction("hard coded string") ) ) );

    return sa.contains( target.toUpperCase() ) ;    
    // or
    // Iterables.contains( sa, target.toUpperCase() );
    // which actually doesn't buy me much

}

Compared to Apache Collections, Functions.compose( g, f ) reverses the "intuitive" order: functions are applied right-to-left, rather than the "obvious" left-to-right of TransformerUtils.chainedTransformer.

A more subtle issue is that, as Guava returns a live view, calling contains on the live view is likely to apply the (composed) function multiple times, so what I really ought to do is:

   return ImmutableSet.copy( sa ).contains( target.toUpperCase() ) ;

But I might have nulls in my transformed set, so I can't quite do that. I can dump it into a java.util.Collection, of course.

But that's not going to be obvious to my (less experienced) team, and will likely be missed in the heat of coding even after I explain it. I'd hoped that perhaps Iterables.contains() would "do the right thing" and know some instance-of magic to distinguish a live-view proxy from a plain old Collection, but it doesn't. That makes Guava perhaps harder to use.

Perhaps I write something like a static method in my utility class to handle this?

// List always uses linear search? So no value in copying?
// or perhaps I should copy it into a set?
boolean contains( final List list, final Object target ) {
  return list.contains( target ) ;
}

// Set doesn't use linear search, so copy?
boolean contains( final Set set, final Object target ) {
  //return ImmutableSet.copy( set ).contains( target ) ;
  // whoops, I might have nulls
  return Sets.newHashSet( set ).contains( target ) ;
}

or perhaps only copy sets above a certain size?

// Set doesn't use linear search, so copy?
boolean contains( final Set set, final Object target ) {
  final Set search = set.size() > 16 : Sets.newHashSet( set ) : set ;
  return search.contains( target ) ;
}

I suppose I'm asking, "why isn't there an 'easier' transform in Guava", and I suppose the answer is, "fine, just always dump what it returns into a new Collection, or write your own transform that does that".

But if I need to do that, might not other clients of the Guava libraries? Perhaps there's a better way that is in Guava, that I don't know about?

+15  A: 

I'd say that Guava is most definitely not harder to use than Apache Collections. I'd say it's a lot easier, actually.

One of the big points in Guava's advantage is that it doesn't expose so many new object types... it likes to keep most of the actual implementation types it uses hidden neatly away behind static factory methods that only expose the interface. Take the various Predicates, for example. In Apache Collections, you have top level public implementation classes like:

NullPredicate
NotNullPredicate
NotPredicate
AllPredicate
AndPredicate
AnyPredicate
OrPredicate

Plus a ton more.

In Guava, these are neatly packaged up in a single top level class, Predicates:

Predicates.isNull()
Predicates.notNull()
Predicates.not(...)
Predicates.and(...)
Predicates.or(...)

None of them expose their implementation class, because you don't need to know it! While Apache Collections does have an equivalent PredicateUtils, the fact that it exposes the types of its Predicates makes it harder to use. As I see it, Apache Collections is just a whole mess of unnecessary visible classes and not-very-useful parts that add clutter and make it harder to get at and use the useful parts. The difference is clear when you look at the number of classes and interfaces the two libraries expose:

  • Apache Collections exposes 309 types.
  • Guava, including all its packages (not just Collections) exposes just 191 types.

Add to that the way Guava is much more careful only to include truly useful utilities and classes, its rigorous adherence to the contracts of the interfaces it implements, etc. and I think it's a much higher quality, easier to use library.

To address some of your specific points:

I actually think that the order Guava chose for Functions.compose is more intuitive (though I think that's quite a subjective argument to begin with). Note that in your example of composition with Guava, the order in which the functions will be applied reads from the end of the declaration toward the place where the final result is assigned. Another problem with your example is that it isn't type-safe to begin with, since the original example involves casting the result of the get method to another type. An advantage of Guava's compose over the array of Transformers in the Apache Commons example is that compose can do a type-safe composition of functions, ensuring (at compile time) that the series of functions you're applying will work correctly. The Apache version is completely unsafe in this regard.

Views are superior to copies:

Second, about the live view "issue" of Collections2.transform. To be blunt, you're completely wrong on that point. The use of a live view rather than copying all elements of the original Collection into a new Collection is actually far more efficient! Here's what's going to happen when you call Collections2.transform and then call contains on the Collection it returns:

  • A view Collection wrapping the original is created... the original and the Function are both simply assigned to fields in it.
  • The Collection's iterator is retrieved.
  • For each element in the Iterator, the Function will be applied, getting the transformed value of that element.
  • When the first element for which the transformed value equals the object you're checking for is found, contains will return. You only iterate (and apply the Function) until a match is found! The Function is applied at most once per element!

Here's what the Apache Collections version does:

  • Creates a new ArrayList to store the transformed values.
  • Gets the original Collection's iterator.
  • For each element in the original Collection's iterator, applies the function and adds the result to the new Collection. This is done for every element of the original Collection, even if the result of applying the Transformer to the very first element would have matched the object we're looking for!
  • Then, contains will iterate over each element in the new Collection looking for the result.

Here's the best and worst case scenarios for a Collection of size N using both libraries. The best case is when the transformed value of the first element equals the object you're looking for with contains and the worst case is when the value you're looking for with contains does not exist in the transformed collection.

  • Guava:
    • Best case: iterates 1 element, applies Function 1 time, stores 0 additional elements.
    • Worst case: iterates N elements, applies Function N times, stores 0 additional elements.
  • Apache:
    • Best case: iterates N + 1 elements, applies Transformer N times, stores N additional elements (the transformed collection).
    • Worst case: iterates 2N elements, applies Transformer N times, stores N additional elements (the transformed collection).

I hope it's obvious from the above that, in general, a view is a very good thing! Plus, it's really easy to copy a view into a non-view collection any time that would be useful, and that will have the same performance as the Apache version does to begin with. However, it would decidedly not be useful in any of the examples you've given.

As a final minor note, Iterables.contains exists simply to allow you to check if an Iterable that you do not know to be a Collection contains a value. If the Iterable you give it actually is a Collection, it nicely just calls contains() on that Collection for you to allow for possible better performance (if it's a Set, say).

ColinD
For the composition order, I think there is a more objective answer, like http://en.wikipedia.org/wiki/Function_composition. I think it must be read like "apply g() on the result of f().
Sylvain M
@Sylvain: Yes, that makes sense. The documentation describes the composition as "g(f(a))" as well.
ColinD
+3  A: 

As one of the developers of Guava, I'm obviously biased, but here are a few comments.

Ease of use was one of the primary goals behind Guava's design. There's always room for improvement, and we're eager to hear any suggestions or concerns. There's usually a rationale behind the design decisions, though everyone could probably find things that they personally disagree with.

In terms of the live views, ColinD described the performance advantages that exist for some use cases. Also, sometimes you want the changes to the view to alter the original collection and vice versa.

Now, there are cases in which copying the collection provides better performance, but it just takes a single line of code to do so. While Guava could include transformAndCopy() methods, we omit one-line methods except for extremely common cases like Maps.newHashMap(). The more methods that are present, the more difficult it is to find the method that you need.

Jared Levy