+doc, str #16714: Add documentation explaining parallelism and pipelining
This commit is contained in:
parent
089760e4e5
commit
6736d91110
6 changed files with 315 additions and 0 deletions
|
|
@ -0,0 +1,106 @@
|
|||
package docs.stream
|
||||
|
||||
import akka.stream.scaladsl.{ FlowGraph, Merge, Balance, Source, Flow }
|
||||
import akka.stream.testkit.AkkaSpec
|
||||
|
||||
class FlowParallelismDocSpec extends AkkaSpec {
|
||||
|
||||
import FlowGraph.Implicits._
|
||||
|
||||
case class ScoopOfBatter()
|
||||
case class HalfCookedPancake()
|
||||
case class Pancake()
|
||||
|
||||
//format: OFF
|
||||
//#pipelining
|
||||
// Takes a scoop of batter and creates a pancake with one side cooked
|
||||
val fryingPan1: Flow[ScoopOfBatter, HalfCookedPancake, Unit] =
|
||||
Flow[ScoopOfBatter].map { batter => HalfCookedPancake() }
|
||||
|
||||
// Finishes a half-cooked pancake
|
||||
val fryingPan2: Flow[HalfCookedPancake, Pancake, Unit] =
|
||||
Flow[HalfCookedPancake].map { halfCooked => Pancake() }
|
||||
//#pipelining
|
||||
//format: ON
|
||||
|
||||
"Demonstrate pipelining" in {
|
||||
//#pipelining
|
||||
|
||||
// With the two frying pans we can fully cook pancakes
|
||||
val pancakeChef: Flow[ScoopOfBatter, Pancake, Unit] =
|
||||
Flow[ScoopOfBatter].via(fryingPan1).via(fryingPan2)
|
||||
//#pipelining
|
||||
}
|
||||
|
||||
"Demonstrate parallel processing" in {
|
||||
//#parallelism
|
||||
val fryingPan: Flow[ScoopOfBatter, Pancake, Unit] =
|
||||
Flow[ScoopOfBatter].map { batter => Pancake() }
|
||||
|
||||
val pancakeChef: Flow[ScoopOfBatter, Pancake, Unit] = Flow() { implicit builder =>
|
||||
val dispatchBatter = builder.add(Balance[ScoopOfBatter](2))
|
||||
val mergePancakes = builder.add(Merge[Pancake](2))
|
||||
|
||||
// Using two frying pans in parallel, both fully cooking a pancake from the batter.
|
||||
// We always put the next scoop of batter to the first frying pan that becomes available.
|
||||
dispatchBatter.out(0) ~> fryingPan ~> mergePancakes.in(0)
|
||||
// Notice that we used the "fryingPan" flow without importing it via builder.add().
|
||||
// Flows used this way are auto-imported, which in this case means that the two
|
||||
// uses of "fryingPan" mean actually different stages in the graph.
|
||||
dispatchBatter.out(1) ~> fryingPan ~> mergePancakes.in(1)
|
||||
|
||||
(dispatchBatter.in, mergePancakes.out)
|
||||
}
|
||||
|
||||
//#parallelism
|
||||
}
|
||||
|
||||
"Demonstrate parallelized pipelines" in {
|
||||
//#parallel-pipeline
|
||||
val pancakeChef: Flow[ScoopOfBatter, Pancake, Unit] = Flow() { implicit builder =>
|
||||
|
||||
val dispatchBatter = builder.add(Balance[ScoopOfBatter](2))
|
||||
val mergePancakes = builder.add(Merge[Pancake](2))
|
||||
|
||||
// Using two pipelines, having two frying pans each, in total using
|
||||
// four frying pans
|
||||
dispatchBatter.out(0) ~> fryingPan1 ~> fryingPan2 ~> mergePancakes.in(0)
|
||||
dispatchBatter.out(1) ~> fryingPan1 ~> fryingPan2 ~> mergePancakes.in(1)
|
||||
|
||||
(dispatchBatter.in, mergePancakes.out)
|
||||
}
|
||||
//#parallel-pipeline
|
||||
}
|
||||
|
||||
"Demonstrate pipelined parallel processing" in {
|
||||
//#pipelined-parallel
|
||||
val pancakeChefs1: Flow[ScoopOfBatter, HalfCookedPancake, Unit] = Flow() { implicit builder =>
|
||||
val dispatchBatter = builder.add(Balance[ScoopOfBatter](2))
|
||||
val mergeHalfPancakes = builder.add(Merge[HalfCookedPancake](2))
|
||||
|
||||
// Two chefs work with one frying pan for each, half-frying the pancakes then putting
|
||||
// them into a common pool
|
||||
dispatchBatter.out(0) ~> fryingPan1 ~> mergeHalfPancakes.in(0)
|
||||
dispatchBatter.out(1) ~> fryingPan1 ~> mergeHalfPancakes.in(1)
|
||||
|
||||
(dispatchBatter.in, mergeHalfPancakes.out)
|
||||
}
|
||||
|
||||
val pancakeChefs2: Flow[HalfCookedPancake, Pancake, Unit] = Flow() { implicit builder =>
|
||||
val dispatchHalfPancakes = builder.add(Balance[HalfCookedPancake](2))
|
||||
val mergePancakes = builder.add(Merge[Pancake](2))
|
||||
|
||||
// Two chefs work with one frying pan for each, finishing the pancakes then putting
|
||||
// them into a common pool
|
||||
dispatchHalfPancakes.out(0) ~> fryingPan2 ~> mergePancakes.in(0)
|
||||
dispatchHalfPancakes.out(1) ~> fryingPan2 ~> mergePancakes.in(1)
|
||||
|
||||
(dispatchHalfPancakes.in, mergePancakes.out)
|
||||
}
|
||||
|
||||
val kitchen: Flow[ScoopOfBatter, Pancake, Unit] = pancakeChefs1.via(pancakeChefs2)
|
||||
//#pipelined-parallel
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
|
@ -192,6 +192,7 @@ element. If this function would return a pair of the two argument it would be ex
|
|||
|
||||
.. includecode:: code/docs/stream/cookbook/RecipeManualTrigger.scala#manually-triggered-stream-zipwith
|
||||
|
||||
.. _cookbook-balance-scala:
|
||||
|
||||
Balancing jobs to a fixed pool of workers
|
||||
-----------------------------------------
|
||||
|
|
|
|||
|
|
@ -16,6 +16,7 @@ Streams
|
|||
stream-integrations
|
||||
stream-error
|
||||
stream-io
|
||||
stream-parallelism
|
||||
stream-cookbook
|
||||
../stream-configuration
|
||||
|
||||
|
|
|
|||
103
akka-docs-dev/rst/scala/stream-parallelism.rst
Normal file
103
akka-docs-dev/rst/scala/stream-parallelism.rst
Normal file
|
|
@ -0,0 +1,103 @@
|
|||
.. _stream-parallelism-scala:
|
||||
|
||||
##########################
|
||||
Pipelining and Parallelism
|
||||
##########################
|
||||
|
||||
Akka Streams processing stages (be it simple operators on Flows and Sources or graph junctions) are executed
|
||||
concurrently by default. This is realized by mapping each of the processing stages to a dedicated actor internally.
|
||||
|
||||
We will illustrate through the example of pancake cooking how streams can be used for various processing patterns,
|
||||
exploiting the available parallelism on modern computers. The setting is the following: both Patrik and Roland
|
||||
like to make pancakes, but they need to produce sufficient amount in a cooking session to make all of the children
|
||||
happy. To increase their pancake production throughput they use two frying pans. How they organize their pancake
|
||||
processing is markedly different.
|
||||
|
||||
Pipelining
|
||||
----------
|
||||
|
||||
Roland uses the two frying pans in an asymmetric fashion. The first pan is only used to fry one side of the
|
||||
pancake then the half-finished pancake is flipped into the second pan for the finishing fry on the other side.
|
||||
Once the first frying pan becomes available it gets a new scoop of batter. As an effect, most of the time there
|
||||
are two pancakes being cooked at the same time, one being cooked on its first side and the second being cooked to
|
||||
completion.
|
||||
This is how this setup would look like implemented as a stream:
|
||||
|
||||
.. includecode:: code/docs/stream/FlowParallelismDocSpec.scala#pipelining
|
||||
|
||||
The two ``map`` stages in sequence (encapsulated in the "frying pan" flows) will be executed in a pipelined way,
|
||||
basically doing the same as Roland with his frying pans:
|
||||
|
||||
1. A ``ScoopOfBatter`` enters ``fryingPan1``
|
||||
2. ``fryingPan1`` emits a HalfCookedPancake once ``fryingPan2`` becomes available
|
||||
3. ``fryingPan2`` takes the HalfCookedPancake
|
||||
4. at this point fryingPan1 already takes the next scoop, without waiting for fryingPan2 to finish
|
||||
|
||||
The benefit of pipelining is that it can be applied to any sequence of processing steps that are otherwise not
|
||||
parallelisable (for example because the result of a processing step depends on all the information from the previous
|
||||
step). One drawback is that if the processing times of the stages are very different then some of the stages will not
|
||||
be able to operate at full throughput because they will wait on a previous or subsequent stage most of the time. In the
|
||||
pancake example frying the second half of the pancake is usually faster than frying the first half, ``fryingPan2`` will
|
||||
not be able to operate at full capacity [#]_.
|
||||
|
||||
Stream processing stages have internal buffers to make communication between them more efficient. For more details
|
||||
about the behavior of these and how to add additional buffers refer to :ref:`stream-rate-scala`.
|
||||
|
||||
Parallel processing
|
||||
-------------------
|
||||
|
||||
Patrik uses the two frying pans symmetrically. He uses both pans to fully fry a pancake on both sides, then puts
|
||||
the results on a shared plate. Whenever a pan becomes empty, he takes the next scoop from the shared bowl of batter.
|
||||
In essence he parallelizes the same process over multiple pans. This is how this setup will look like if implemented
|
||||
using streams:
|
||||
|
||||
.. includecode:: code/docs/stream/FlowParallelismDocSpec.scala#parallelism
|
||||
|
||||
The benefit of parallelizing is that it is easy to scale. In the pancake example
|
||||
it is easy to add a third frying pan with Patrik's method, but Roland cannot add a third frying pan,
|
||||
since that would require a third processing step, which is not practically possible in the case of frying pancakes.
|
||||
|
||||
One drawback of the example code above that it does not preserve the ordering of pancakes. This might be a problem
|
||||
if children like to track their "own" pancakes. In those cases the ``Balance`` and ``Merge`` stages should be replaced
|
||||
by strict-round robing balancing and merging stages that put in and take out pancakes in a strict order.
|
||||
|
||||
A more detailed example of creating a worker pool can be found in the cookbook: :ref:`cookbook-balance-scala`
|
||||
|
||||
Combining pipelining and parallel processing
|
||||
--------------------------------------------
|
||||
|
||||
The two concurrency patterns that we demonstrated as means to increase throughput are not exclusive.
|
||||
In fact, it is rather simple to combine the two approaches and streams provide
|
||||
a nice unifying language to express and compose them.
|
||||
|
||||
First, let's look at how we can parallelize pipelined processing stages. In the case of pancakes this means that we
|
||||
will employ two chefs, each working using Roland's pipelining method, but we use the two chefs in parallel, just like
|
||||
Patrik used the two frying pans. This is how it looks like if expressed as streams:
|
||||
|
||||
.. includecode:: code/docs/stream/FlowParallelismDocSpec.scala#parallel-pipeline
|
||||
|
||||
The above pattern works well if there are many independent jobs that do not depend on the results of each other, but
|
||||
the jobs themselves need multiple processing steps where each step builds on the result of
|
||||
the previous one. In our case individual pancakes do not depend on each other, they can be cooked in parallel, on the
|
||||
other hand it is not possible to fry both sides of the same pancake at the same time, so the two sides have to be fried
|
||||
in sequence.
|
||||
|
||||
It is also possible to organize parallelized stages into pipelines. This would mean employing four chefs:
|
||||
|
||||
- the first two chefs prepare half-cooked pancakes from batter, in parallel, then putting those on a large enough
|
||||
flat surface.
|
||||
- the second two chefs take these and fry their other side in their own pans, then they put the pancakes on a shared
|
||||
plate.
|
||||
|
||||
This is again straightforward to implement with the streams API:
|
||||
|
||||
.. includecode:: code/docs/stream/FlowParallelismDocSpec.scala#pipelined-parallel
|
||||
|
||||
This usage pattern is less common but might be usable if a certain step in the pipeline might take wildly different
|
||||
times to finish different jobs. The reason is that there are more balance-merge steps in this pattern
|
||||
compared to the parallel pipelines. This pattern rebalances after each step, while the previous pattern only balances
|
||||
at the entry point of the pipeline. This only matters however if the processing time distribution has a large
|
||||
deviation.
|
||||
|
||||
.. [#] Roland's reason for this seemingly suboptimal procedure is that he prefers the temperature of the second pan
|
||||
to be slightly lower than the first in order to achieve a more homogeneous result.
|
||||
Loading…
Add table
Add a link
Reference in a new issue