views:

253

answers:

5

Hello Everyone.

I am trying to create a little functional programming library for Java (just to scratch my own itch). While defining the higher-order functions for Lists, Sets and Maps I have come across this problem: The functions that take a collection, and return a collection of same type have almost the same implementation, and yet have to be redefined for each of the data structure - Lists, Sets, and Maps.

For example, here is the implementation of map function for Lists, and Sets:

public static <A, B> List<B> map(
  List<? extends A> xs, 
  Func1<? super A, ? extends B> transformer
) {
  List<B> ys = new ArrayList<B>();
  for(A a : xs) {
    ys.add(transformer.apply(a));
  }
  return ys;
}

public static <A, B> Set<B> map(
  Set<? extends A> xs, 
  Func1<? super A, ? extends B> transformer
) {
  Set<B> ys = new HashSet<B>();
  for(A a : xs) {
    ys.add(transformer.apply(a));
  }
  return ys;
}

A filter function:

public static <A> List<A> filter(
  List<? extends A> xs, 
  Func1<? super A, Boolean> predicate
) {
  List<A> ys = new ArrayList<A>();
  for(A a : xs) {
    if(predicate.apply(a)) {
      ys.add(a);
    }
  }
  return ys;
}

public static <A> Set<A> filter(
  Set<? extends A> xs, 
  Func1<? super A, Boolean> predicate
) {
  Set<A> ys = new HashSet<A>();
  for(A a : xs) {
    if(predicate.apply(a)) {
      ys.add(a);
    }
  }
  return ys;
}

As can be seen from this example, the bodies of the implementations for Set and List are almost the same.

There are lot many functions like map and filter in my library, and each of those is defined thrice for each type of collections I am interested in (i.e. List, Set, and Map). This leads to a lot of code duplication, and code smell. I wanted to know whether there's some way in Java that would help me avoid all the code duplication.

Any help will be greatly appreciated. Thanks.

EDIT:

Func1 is an interface defined as:

interface Func1<A, B> {
  public B apply(A a);
}
A: 

Effectively a list is just a Monad for a type T, giving it the capability to store multiple instances of the type. That's why all the usual laws of monads apply here, so you can implement all operations using a bind and return member.

I'm sorry I don't have the time to explain further for now, but in the .NET space we have SelectMany and Enumerable.Repeat(1, element) for the same purposes. There's a lot of information available about that.

Any operator (such as filter in your example) can be implemented using SelectMay respectively bind.

Johannes Rudolph
Thanks for the response Johannes but I am not using any functional data structures here. `List` and `Set` in my examples are `java.util.List` and `java.util.Set` respectively.
one-zero-zero-one
sure, but these implement something like IEnumerable or ICollection (the collection monad in this case)
Johannes Rudolph
@Johannes: Can you please add some code to explain your point?
one-zero-zero-one
+6  A: 
public static <A, B> List<B> map(
  List<? extends A> xs, 
  Func1<? super A, ? extends B> transformer
) {
  List<B> ys = new ArrayList<B>();
  map(xy, transformer, ys);
  return ys;
}

public static <A, B> Set<B> map(
  Set<? extends A> xs, 
  Func1<? super A, ? extends B> transformer
) {
  Set<B> ys = new HashSet<B>();
  map(xy, transformer, ys);
  return ys;
}
private static <A, B> map(
  Collection<? extends A> xs, 
  Func1<? super A, ? extends B> transformer,
  Iterable<B> ys
) {
  for(A a : xs) {
    ys.add(transformer.apply(a));
  }
}

Job done.

Note, it's typical of Java APIs, to pass the mutable the collection in, rather than create a new one in the method. Personally, I'm not a fan of mutability at the collection level, but it's what we have to work with (in Java).

(I'm not keen on A and B as generic parameters with this sort of stuff.)

Or you could use a factory:

public static <A, B> List<B> map(
  List<? extends A> xs, 
  Func1<? super A, ? extends B> transformer
) {
  return map(xs, transformer, new CollectionFactory<B, List<B>>() {
      public List<B> create() { return new ArrayList<B>(); }
  });
}

public static <A, B> Set<B> map(
  Set<? extends A> xs, 
  Func1<? super A, ? extends B> transformer
) {
  return map(xs, transformer, new CollectionFactory<B, Set<B>>() {
      public Set<B> create() { return new HashSet<B>(); }
  });
}

private interface CollectionFactory<E, C extends Collection<E>> {
    C create();
}

private static <A, B, C extends Collection<B>> C map(
  Iterable<? extends A> xs, 
  Func1<? super A, ? extends B> transformer,
  CollectionFactory<B, C> factory
) {
  C ys = factory.create();
  for(A a : xs) {
    ys.add(transformer.apply(a));
  }
  return ys;
}

(If you can put up with the pointless verbosity of anonymous inner classes.)

If it wasn't for Collection then you'd need to put some (ugly) adapter in.

For completeness (though not tested, could do with a few tweaks), an unpleasant solution using inheritance:

Set<String> strs = hashSets().map(things, formatter);

...

public static <E> Functions<E, Set<E>> hashSets() {
    return new Functions<E, Set<E>>() {
        protected Set<E> createCollections() {
            return new HashSet<E>();
        }
    };
}

public abstract class Functions<E, C extends Collection<E>> {
    protected abstract C createCollection();

    public <S> C map(
      Set<? extends S> xs, 
      Func1<? super S, ? extends E> transformer
    ) {
      C ys = createCollection();
      for(S a : xs) {
        ys.add(transformer.apply(a));
      }
      return ys;
    }

    public <S> C filter(
      List<? extends S> xs, 
      Func1<? super S, Boolean> predicate // Predicate<? super S> might be nicer!!
    ) {
      C ys = createCollection();
      for(A a : xs) {
        if(predicate.apply(a)) {
          ys.add(a);
        }
      }
      return ys;
    }
}
Tom Hawtin - tackline
The API is the same, the new map method is private
Jon Freedman
It's still a lot of code duplication. For each new method I might want to add, I'd need to write the private implementation using `Collections` and then a convenience method for each of the data type. Come on, there has to be a better way to do this. :(
one-zero-zero-one
@one-zero-zero-one You would need a method with the common code and a method which decides which implementation to use, yes. You could just use the implementation method. You could use inheritance, but for these sorts of static methods, I'd call that unpleasant.
Tom Hawtin - tackline
+2  A: 

I don't believe Java's type system is sophisticated enough to address this, but Scala's is. With the 2.8 version of the collections library, they built a system to automatically create a collection of the appropriate type based on the collection you're working with. So, if you call filter on a List, it returns a new List. Call filter on a Set and you'll get a Set back. It does this while still only having a single implementation of filter.

To learn more, take a look at Traversable and the stuff that uses it. I believe CanBuildFrom is where a lot of the magic happens.

munificent
+4  A: 

I don't think you can do better than what Tom has suggested in his answer. Java doesn't support higher-kinded types - a feature that can help you abstract over the collection type and thus avoid duplicating the same code for each of the collection types.

Scala supports this feature, and it's used extensively in its standard library. This paper by Adriaan Moors discusses how Scala avoids this sort of code duplication with help of higher-kinded types.

Two screenshots from the above-mentioned paper:


alt text


alt text

missingfaktor
Agreed. Tom (above) is incorrect.
Tony Morris
+3  A: 

Java doesn't have higher-order polymorphism (aka higher-kinds) so this is not possible within the type system. Many Java programmers resort to XML and/or reflection (i.e. escape the type system) to address this deficiency.

Scala can deal with this and what you are describing is called a covariant functor. This rather fundamental data type (along with much more) has been implemented in the Scalaz library and includes implementations for java.util.*.

Further, there are many more covariant functors that are not collections and many more functors that are not covariant.

You may wish to google for "20 Intermediate Scala Exercises" if you wish to explore this particular concept further.

Tony Morris