collect() is useful when you have possible alternative representations of your objects.
For example, recently I was dealing with a piece of code that needed to match lists of objects from two different sources. These objects were of different classes as they were used at different points in the code, but for my purposes had the same relevant concepts (i.e. they both had an ID, both had a property path, both had a "cascade" flag etc.).
I found that it was much easier to define a simple intermediate representation of these properties (as an inner class), define transformers for both concrete object classes (again very simple as it's just using relevant accessor methods to get the properties out), and then use collect()
to convert my incoming objects into the intermediate representation. Once they're there, I can use standard Collections methods to compare and manipulate the two as sets.
So as a (semi-)concrete example, let's say I need a method to check that the set of objects in the presentation layer is a subset of the objects cached in the data layer. With the approach outlined above this would be done something like this:
public boolean isColumnSubset(PresSpec pres, CachedDataSpec dataSpec)
{
final List<IntermediateRepresentation> presObjects = CollectionUtils.collect(pres.getObjects(), PRES_TRANSFORMER);
final List<IntermediateRepresentation> dataObjects = CollectionUtils.collect(dataSpec.getCached(), DATA_TRANSFORMER);
return dataObjects.containsAll(presObjects);
}
To me this is much more readable, with the last line conveying a real sense of what the method is doing, than the equivalent with loops:
public boolean isColumnSubset(PresSpec pres, CachedDataSpec dataSpec)
{
for (PresSpecificObject presObj : pres.getObjects())
{
boolean matched = false;
for (CachedDataObject dataObj : dataSpec.getCached())
{
if (areObjectsEquivalent(presObj, dataObj)) // or do the tests inline but a method is cleaner
{
matched = true;
break;
}
}
if (matched == false)
{
return false;
}
}
// Every column must have matched
return true;
}
The two are probably about as efficient, but in terms of readability I'd say that the first one is much easier to immediately understand. Even though it comes in being more lines of code overall (due to defining an inner class and two transformers), the separation of the traversal implementation from the actual "true or false" logic makes the latter much clearer. Plus if you have any KLOC metrics it can't be bead either. ;-)