=doc minor fixes in stream quickstart

This commit is contained in:
2beaucoup 2016-01-06 13:27:41 +01:00
parent b4f367e46a
commit db353504ab
2 changed files with 34 additions and 37 deletions

View file

@ -10,7 +10,7 @@ We will also consider the problem inherent to all non-blocking streaming
solutions: *"What if the subscriber is too slow to consume the live stream of
data?"*. Traditionally the solution is often to buffer the elements, but this
can—and usually will—cause eventual buffer overflows and instability of such
systems. Instead Akka Streams depend on internal backpressure signals that
systems. Instead Akka Streams depend on internal backpressure signals that
allow to control what should happen in such scenarios.
Here's the data model we'll be working with throughout the quickstart examples:
@ -34,9 +34,9 @@ which will be responsible for materializing and running the streams we are about
The :class:`ActorMaterializer` can optionally take :class:`ActorMaterializerSettings` which can be used to define
materialization properties, such as default buffer sizes (see also :ref:`stream-buffers-scala`), the dispatcher to
be used by the pipeline etc. These can be overridden ``withAttributes`` on :class:`Flow`, :class:`Source`, :class:`Sink` and :class:`Graph`.
be used by the pipeline etc. These can be overridden with ``withAttributes`` on :class:`Flow`, :class:`Source`, :class:`Sink` and :class:`Graph`.
Let's assume we have a stream of tweets readily available, in Akka this is expressed as a :class:`Source[Out, M]`:
Let's assume we have a stream of tweets readily available. In Akka this is expressed as a :class:`Source[Out, M]`:
.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#tweet-source
@ -52,14 +52,14 @@ only make sense in streaming and vice versa):
.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#authors-filter-map
Finally in order to :ref:`materialize <stream-materialization-scala>` and run the stream computation we need to attach
the Flow to a :class:`Sink` that will get the flow running. The simplest way to do this is to call
the Flow to a :class:`Sink` that will get the Flow running. The simplest way to do this is to call
``runWith(sink)`` on a ``Source``. For convenience a number of common Sinks are predefined and collected as methods on
the :class:`Sink` `companion object <http://doc.akka.io/api/akka-stream-and-http-experimental/@version@/#akka.stream.scaladsl.Sink$>`_.
For now let's simply print each author:
.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#authors-foreachsink-println
or by using the shorthand version (which are defined only for the most popular sinks such as ``Sink.fold`` and ``Sink.foreach``):
or by using the shorthand version (which are defined only for the most popular Sinks such as ``Sink.fold`` and ``Sink.foreach``):
.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#authors-foreach-println
@ -85,15 +85,14 @@ combinator:
due to the risk of deadlock (with merge being the preferred strategy), and second, the monad laws would not hold for
our implementation of flatMap (due to the liveness issues).
Please note that the mapConcat requires the supplied function to return a strict collection (``f:Out=>immutable.Seq[T]``),
Please note that the ``mapConcat`` requires the supplied function to return a strict collection (``f:Out=>immutable.Seq[T]``),
whereas ``flatMap`` would have to operate on streams all the way through.
Broadcasting a stream
---------------------
Now let's say we want to persist all hashtags, as well as all author names from this one live stream.
For example we'd like to write all author handles into one file, and all hashtags into another file on disk.
This means we have to split the source stream into 2 streams which will handle the writing to these different files.
This means we have to split the source stream into two streams which will handle the writing to these different files.
Elements that can be used to form such "fan-out" (or "fan-in") structures are referred to as "junctions" in Akka Streams.
One of these that we'll be using in this example is called :class:`Broadcast`, and it simply emits elements from its
@ -120,14 +119,12 @@ Both :class:`Graph` and :class:`RunnableGraph` are *immutable, thread-safe, and
A graph can also have one of several other shapes, with one or more unconnected ports. Having unconnected ports
expresses a grapth that is a *partial graph*. Concepts around composing and nesting graphs in large structures are
explained explained in detail in :ref:`composition-scala`. It is also possible to wrap complex computation graphs
explained in detail in :ref:`composition-scala`. It is also possible to wrap complex computation graphs
as Flows, Sinks or Sources, which will be explained in detail in
:ref:`constructing-sources-sinks-flows-from-partial-graphs-scala`.
Back-pressure in action
-----------------------
One of the main advantages of Akka Streams is that they *always* propagate back-pressure information from stream Sinks
(Subscribers) to their Sources (Publishers). It is not an optional feature, and is enabled at all times. To learn more
about the back-pressure protocol used by Akka Streams and all other Reactive Streams compatible implementations read
@ -153,30 +150,30 @@ So far we've been only processing data using Flows and consuming it into some ki
values or storing them in some external system. However sometimes we may be interested in some value that can be
obtained from the materialized processing pipeline. For example, we want to know how many tweets we have processed.
While this question is not as obvious to give an answer to in case of an infinite stream of tweets (one way to answer
this question in a streaming setting would to create a stream of counts described as "*up until now*, we've processed N tweets"),
this question in a streaming setting would be to create a stream of counts described as "*up until now*, we've processed N tweets"),
but in general it is possible to deal with finite streams and come up with a nice result such as a total count of elements.
First, let's write such an element counter using ``Sink.fold`` and see how the types look like:
.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#tweets-fold-count
First we prepare a reusable ``Flow`` that will change each incoming tweet into an integer of value ``1``.
We'll use this in order to combine those ones with a ``Sink.fold`` will sum all ``Int`` elements of the stream
and make its result available as a ``Future[Int]``. Next we connect the ``tweets`` stream though a ``map`` step which
converts each tweet into the number ``1``, finally we connect the flow using ``toMat`` the previously prepared Sink.
First we prepare a reusable ``Flow`` that will change each incoming tweet into an integer of value ``1``. We'll use this in
order to combine those with a ``Sink.fold`` that will sum all ``Int`` elements of the stream and make its result available as
a ``Future[Int]``. Next we connect the ``tweets`` stream to ``count`` with ``via``. Finally we connect the Flow to the previously
prepared Sink using ``toMat``.
Remember those mysterious ``Mat`` type parameters on ``Source[+Out, +Mat]``, ``Flow[-In, +Out, +Mat]`` and ``Sink[-In, +Mat]``?
They represent the type of values these processing parts return when materialized. When you chain these together,
you can explicitly combine their materialized values: in our example we used the ``Keep.right`` predefined function,
you can explicitly combine their materialized values. In our example we used the ``Keep.right`` predefined function,
which tells the implementation to only care about the materialized type of the stage currently appended to the right.
As you can notice, the materialized type of sumSink is ``Future[Int]`` and because of using ``Keep.right``,
the resulting :class:`RunnableGraph` has also a type parameter of ``Future[Int]``.
The materialized type of ``sumSink`` is ``Future[Int]`` and because of using ``Keep.right``, the resulting :class:`RunnableGraph`
has also a type parameter of ``Future[Int]``.
This step does *not* yet materialize the
processing pipeline, it merely prepares the description of the Flow, which is now connected to a Sink, and therefore can
be ``run()``, as indicated by its type: ``RunnableGraph[Future[Int]]``. Next we call ``run()`` which uses the implicit :class:`ActorMaterializer`
to materialize and run the flow. The value returned by calling ``run()`` on a ``RunnableGraph[T]`` is of type ``T``.
In our case this type is ``Future[Int]`` which, when completed, will contain the total length of our tweets stream.
to materialize and run the Flow. The value returned by calling ``run()`` on a ``RunnableGraph[T]`` is of type ``T``.
In our case this type is ``Future[Int]`` which, when completed, will contain the total length of our ``tweets`` stream.
In case of the stream failing, this future would complete with a Failure.
A :class:`RunnableGraph` may be reused