View on GitHub

Tutorials

Simple Markdown Tutorials

Java 8 Collector Guide

Collector Title


Collector Overview:

Collector represents a special mutable reduction operation. Elements are incorporated by updating the state of a mutable container rather than by replacing the intermediate result. This is desirable behavior when we want to reduce a Stream into some sort of Collection. It would be very inefficient to create a new Collection Object during every step of the reduction (as is typical in reduction operations), so we can use Collector to avoid that.

We’ll dive deeper into the finer points of collection vs reduction in a separate tutorial. For now, let’s take a look at the pieces that make up a Collector.

Container Supplier

Container Supplier

The container supplier is responsible for creating a new mutable container for the result. It has the following abstract method signature:

Supplier<A> supplier();

Accumulator

Accumulator

The accumulator incorporates data elements into the result container. It has the following abstract method signature:

BiConsumer<A, T> accumulator();

Combiner

Combiner

The combiner is used (in the Stream framework) during parallel execution. Separate process separate sections of the Stream, accumulating their partial result into a mutable container. Those containers eventually need to be combined into one single result, hence the combiner. It has the following abstract method signature:

BinaryOperator<A> combiner();

Finisher

Finisher

Performs optional final transformation. Collectors may set (and the majority do) the IDENTITY_TRANSFORM characteristic, in which case the finishing transformation is an identity function with an unchecked cast from A to R. It has the following abstract method signature:

Function<A, R> finisher();

An Example Visualized

Let’s try to visualize an example Stream collection process to help understand the different components. Also, be sure to note the differences between serial and parallel execution.

Serial Collection:

Serial Collection

Parallel Collection:

Parallel Collection


What Are The Rules?

To ensure that sequential and parallel executions produce equivalent results, the collector functions must satisfy an identity and an associativity constraints. – JavaDoc

Essentially, there are two things that must hold true in order for a Collector to perform equivalently during parallel and sequential execution.

The Identity Constraint

The Identity Constraint

The identity constraint says that for any partially accumulated result, combining it with an empty result container must produce an equivalent result. That is, for a partially accumulated result a that is the result of any series of accumulator and combiner invocations, a must be equivalent to

combiner.apply(a, supplier.get())

JavaDoc

Essentially, combining a partial result with an empty result should be the same as passing the partial result through the identity function. When combining two result containers, only the specific contents of the containers should affect the result.

The Associativity Constraint

The Associativity Constraint

The associativity constraint says that splitting the computation must produce an equivalent result. That is, for any input elements t1 and t2, the results r1 and r2 in the computation below must be equivalent:

A a1 = supplier.get();
accumulator.accept(a1, t1);-
accumulator.accept(a1, t2);
R r1 = finisher.apply(a1);  // result without splitting

A a2 = supplier.get();
accumulator.accept(a2, t1);
A a3 = supplier.get();
accumulator.accept(a3, t2);
R r2 = finisher.apply(combiner.apply(a2, a3));  // result with splitting

JavaDoc

The associativity constraint is a little more straight-forward. A Collector is considered associative if splitting (and thus processing the elements in a different order) produces the same result.

I think most are familiar with the concept of associativity in algebra:

Algebra Associativity

If you stretch your mind a little bit you can look at associativity in collection the same way.

Associativity Metaphor


A Collection of Collectors

Now that we have a feel for the different components of a Collector, let’s take a look at just a few of the JDK supplied Collectors.

JDK Convenience Collectors

JDK Collectors

JDK Collectors Notes

A Special Case: Collecting into Maps.

In order to collect into a Map, we need to upgrade our accumulator into a higher order function, composed of three other functions. These “sub” functions are:

Key Mapper

Key Mapper

The key mapper transforms each element from the stream into a key for the Map being collected into.

Value Mapper

Value Mapper

The value mapper transforms each element from the stream into a value for the Map being collected into.

Merger

Merger

Any collisions (when two elements produce the same key) are handled by the merger. Many of the predefined Map collectors just throw an Exception unconditionally, but you can easily supply your own merge function if the desired behavior is more complex.

All Composed Together

To Map Collector

Build Your Own

What if none of the supplied Collectors meet our needs? In that case, implementing our own should be no problem! Let’s create a Collector similar to Collectors.toList, but that applies a finishing step of copying the mutable result container into an ImmutableList.

Custom Collector

public class ImmutableListCollector<T> implements Collector<T, List<T>, ImmutableList<T>> {

    @Override
    public Supplier<List<T>> supplier() {
        return ArrayList::new;
    }

    @Override
    public BiConsumer<List<T>, T> accumulator() {
        return List::add;
    }

    @Override
    public BinaryOperator<List<T>> combiner() {
        return (l1, l2) -> {
            l1.addAll(l2);
            return l1;
        };
    }

    @Override
    public Function<List<T>, ImmutableList<T>> finisher() {
        return ImmutableList::copyOf;
    }

    @Override
    public Set<Characteristics> characteristics() {
        return Collections.emptySet();
    }

    public static <T> ImmutableListCollector<T> toImmutableList() {
        return new ImmutableListCollector<>();
    }
}

Try it Yourself!

Here is a sample Stream collection using a simple Collector implementation. Play around with both of them, and run it to see the results!


Resources / Further Reading


Hit Me Up

Social Media Logos

Java Imposter Logo