main work by @drewhk with contributions from @2m and @rkuhn
This work uncovered many well-hidden bugs in existing Stages, in
particular StatefulStage. These were hidden by the behavior of
OneBoundedInterpreter that normally behaves more orderly than it
guarantees in general, especially with respect to the timeliness of
delivery of upstream termination signals; the bugs were then that
internal state was not flushed when onComplete arrived “too early”.
* Gives Inlets and Outlets a `carbonCopy` method and switches to allocate them via `apply`
* Removes 4 Array allocations per FanIn and uses a bitmasked array instead
* Makes the FlattenStrategy.concat instance a singleton