diff --git a/akka-docs-dev/rst/java.rst b/akka-docs-dev/rst/java.rst index 31249b63a6..4eb4370ca8 100644 --- a/akka-docs-dev/rst/java.rst +++ b/akka-docs-dev/rst/java.rst @@ -1,10 +1,10 @@ -.. _java-api: +.. _stream-java-api: Java Documentation -================== +=================== -.. note:: +.. toctree:: + :maxdepth: 2 - The documentation has not yet been ported to Java, you are welcome to read - the Scala version since all APIs are closely reflected between the two - language bindings. + java/stream-index + java/http/index diff --git a/akka-docs-dev/rst/java/stream-index.rst b/akka-docs-dev/rst/java/stream-index.rst new file mode 100644 index 0000000000..95e9d267fd --- /dev/null +++ b/akka-docs-dev/rst/java/stream-index.rst @@ -0,0 +1,20 @@ +.. _streams-java: + +Streams +======= + +.. toctree:: + :maxdepth: 2 + + stream-introduction + stream-quickstart + ../stream-design + stream-flows-and-basics + stream-graphs + stream-rate + stream-customize + stream-integrations + stream-io + stream-cookbook + ../stream-configuration + diff --git a/akka-docs-dev/rst/java/stream-introduction.rst b/akka-docs-dev/rst/java/stream-introduction.rst new file mode 100644 index 0000000000..6528ea8416 --- /dev/null +++ b/akka-docs-dev/rst/java/stream-introduction.rst @@ -0,0 +1,64 @@ +.. _stream-introduction-java: + +############ +Introduction +############ + +Motivation +========== + +The way we consume services from the internet today includes many instances of +streaming data, both downloading from a service as well as uploading to it or +peer-to-peer data transfers. Regarding data as a stream of elements instead of +in its entirety is very useful because it matches the way computers send and +receive them (for example via TCP), but it is often also a necessity because +data sets frequently become too large to be handled as a whole. We spread +computations or analyses over large clusters and call it “big data”, where the +whole principle of processing them is by feeding those data sequentially—as a +stream—through some CPUs. + +Actors can be seen as dealing with streams as well: they send and receive +series of messages in order to transfer knowledge (or data) from one place to +another. We have found it tedious and error-prone to implement all the proper +measures in order to achieve stable streaming between actors, since in addition +to sending and receiving we also need to take care to not overflow any buffers +or mailboxes in the process. Another pitfall is that Actor messages can be lost +and must be retransmitted in that case lest the stream have holes on the +receiving side. When dealing with streams of elements of a fixed given type, +Actors also do not currently offer good static guarantees that no wiring errors +are made: type-safety could be improved in this case. + +For these reasons we decided to bundle up a solution to these problems as an +Akka Streams API. The purpose is to offer an intuitive and safe way to +formulate stream processing setups such that we can then execute them +efficiently and with bounded resource usage—no more OutOfMemoryErrors. In order +to achieve this our streams need to be able to limit the buffering that they +employ, they need to be able to slow down producers if the consumers cannot +keep up. This feature is called back-pressure and is at the core of the +`Reactive Streams`_ initiative of which Akka is a +founding member. For you this means that the hard problem of propagating and +reacting to back-pressure has been incorporated in the design of Akka Streams +already, so you have one less thing to worry about; it also means that Akka +Streams interoperate seamlessly with all other Reactive Streams implementations +(where Reactive Streams interfaces define the interoperability SPI while +implementations like Akka Streams offer a nice user API). + +.. _Reactive Streams: http://reactive-streams.org/ + +How to read these docs +====================== + +Stream processing is a different paradigm to the Actor Model or to Future +composition, therefore it may take some careful study of this subject until you +feel familiar with the tools and techniques. The documentation is here to help +and for best results we recommend the following approach: + +* Read the :ref:`quickstart-java` to get a feel for how streams + look like and what they can do. +* The top-down learners may want to peruse the :ref:`stream-design` at this + point. +* The bottom-up learners may feel more at home rummaging through the + :ref:`stream-cookbook-java`. +* The other sections can be read sequentially or as needed during the previous + steps, each digging deeper into specific topics. + diff --git a/akka-docs-dev/rst/java/stream-quickstart.rst b/akka-docs-dev/rst/java/stream-quickstart.rst new file mode 100644 index 0000000000..900e8324e2 --- /dev/null +++ b/akka-docs-dev/rst/java/stream-quickstart.rst @@ -0,0 +1,169 @@ +.. _quickstart-java: + +Quick Start Guide: Reactive Tweets +================================== + +A typical use case for stream processing is consuming a live stream of data that we want to extract or aggregate some +other data from. In this example we'll consider consuming a stream of tweets and extracting information concerning Akka from them. + +We will also consider the problem inherent to all non-blocking streaming +solutions: *"What if the subscriber is too slow to consume the live stream of +data?"*. Traditionally the solution is often to buffer the elements, but this +can—and usually will—cause eventual buffer overflows and instability of such +systems. Instead Akka Streams depend on internal backpressure signals that +allow to control what should happen in such scenarios. + +Here's the data model we'll be working with throughout the quickstart examples: + +.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#model + +Transforming and consuming simple streams +----------------------------------------- +In order to prepare our environment by creating an :class:`ActorSystem` and :class:`FlowMaterializer`, +which will be responsible for materializing and running the streams we are about to create: + +.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#materializer-setup + +The :class:`FlowMaterializer` can optionally take :class:`MaterializerSettings` which can be used to define +materialization properties, such as default buffer sizes (see also :ref:`stream-buffering-explained-scala`), the dispatcher to +be used by the pipeline etc. These can be overridden on an element-by-element basis or for an entire section, but this +will be discussed in depth in :ref:`stream-section-configuration`. + +Let's assume we have a stream of tweets readily available, in Akka this is expressed as a :class:`Source[Out]`: + +.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#tweet-source + +Streams always start flowing from a :class:`Source[Out]` then can continue through :class:`Flow[In,Out]` elements or +more advanced graph elements to finally be consumed by a :class:`Sink[In]`. Both Sources and Flows provide stream operations +that can be used to transform the flowing data, a :class:`Sink` however does not since its the "end of stream" and its +behavior depends on the type of :class:`Sink` used. + +In our case let's say we want to find all twitter handles of users which tweet about ``#akka``, the operations should look +familiar to anyone who has used the Scala Collections library, however they operate on streams and not collections of data: + +.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#authors-filter-map + +Finally in order to :ref:`materialize ` and run the stream computation we need to attach +the Flow to a :class:`Sink[T]` that will get the flow running. The simplest way to do this is to call +``runWith(sink)`` on a ``Source[Out]``. For convenience a number of common Sinks are predefined and collected as methods on +the :class:``Sink`` `companion object `_. +For now let's simply print each author: + +.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#authors-foreachsink-println + +or by using the shorthand version (which are defined only for the most popular sinks such as :class:`FoldSink` and :class:`ForeachSink`): + +.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#authors-foreach-println + +Materializing and running a stream always requires a :class:`FlowMaterializer` to be in implicit scope (or passed in explicitly, +like this: ``.run(mat)``). + +Flattening sequences in streams +------------------------------- +In the previous section we were working on 1:1 relationships of elements which is the most common case, but sometimes +we might want to map from one element to a number of elements and receive a "flattened" stream, similarly like ``flatMap`` +works on Scala Collections. In order to get a flattened stream of hashtags from our stream of tweets we can use the ``mapConcat`` +combinator: + +.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#hashtags-mapConcat + +.. note:: + The name ``flatMap`` was consciously avoided due to its proximity with for-comprehensions and monadic composition. + It is problematic for two reasons: firstly, flattening by concatenation is often undesirable in bounded stream processing + due to the risk of deadlock (with merge being the preferred strategy), and secondly, the monad laws would not hold for + our implementation of flatMap (due to the liveness issues). + + Please note that the mapConcat requires the supplied function to return a strict collection (``f:Out=>immutable.Seq[T]``), + whereas ``flatMap`` would have to operate on streams all the way through. + + +Broadcasting a stream +--------------------- +Now let's say we want to persist all hashtags, as well as all author names from this one live stream. +For example we'd like to write all author handles into one file, and all hashtags into another file on disk. +This means we have to split the source stream into 2 streams which will handle the writing to these different files. + +Elements that can be used to form such "fan-out" (or "fan-in") structures are referred to as "junctions" in Akka Streams. +One of these that we'll be using in this example is called :class:`Broadcast`, and it simply emits elements from its +input port to all of its output ports. + +Akka Streams intentionally separate the linear stream structures (Flows) from the non-linear, branching ones (FlowGraphs) +in order to offer the most convenient API for both of these cases. Graphs can express arbitrarily complex stream setups +at the expense of not reading as familiarly as collection transformations. It is also possible to wrap complex computation +graphs as Flows, Sinks or Sources, which will be explained in detail in :ref:`constructing-sources-sinks-flows-from-partial-graphs-scala`. +FlowGraphs are constructed like this: + +.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#flow-graph-broadcast + +.. note:: + The ``~>`` (read as "edge", "via" or "to") operator is only available if ``FlowGraphImplicits._`` are imported. + Without this import you can still construct graphs using the ``builder.addEdge(from,[through,]to)`` method. + +As you can see, inside the :class:`FlowGraph` we use an implicit graph builder to mutably construct the graph +using the ``~>`` "edge operator" (also read as "connect" or "via" or "to"). Once we have the FlowGraph in the value ``g`` +*it is immutable, thread-safe, and freely shareable*. A graph can can be ``run()`` directly - assuming all +ports (sinks/sources) within a flow have been connected properly. It is possible to construct :class:`PartialFlowGraph` s +where this is not required but this will be covered in detail in :ref:`partial-flow-graph-scala`. + +As all Akka Streams elements, :class:`Broadcast` will properly propagate back-pressure to its upstream element. + +Back-pressure in action +----------------------- + +One of the main advantages of Akka Streams is that they *always* propagate back-pressure information from stream Sinks +(Subscribers) to their Sources (Publishers). It is not an optional feature, and is enabled at all times. To learn more +about the back-pressure protocol used by Akka Streams and all other Reactive Streams compatible implementations read +:ref:`back-pressure-explained-scala`. + +A typical problem applications (not using Akka Streams) like this often face is that they are unable to process the incoming data fast enough, +either temporarily or by design, and will start buffering incoming data until there's no more space to buffer, resulting +in either ``OutOfMemoryError`` s or other severe degradations of service responsiveness. With Akka Streams buffering can +and must be handled explicitly. For example, if we are only interested in the "*most recent tweets, with a buffer of 10 +elements*" this can be expressed using the ``buffer`` element: + +.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#tweets-slow-consumption-dropHead + +The ``buffer`` element takes an explicit and required ``OverflowStrategy``, which defines how the buffer should react +when it receives another element element while it is full. Strategies provided include dropping the oldest element (``dropHead``), +dropping the entire buffer, signalling errors etc. Be sure to pick and choose the strategy that fits your use case best. + +Materialized values +------------------- +So far we've been only processing data using Flows and consuming it into some kind of external Sink - be it by printing +values or storing them in some external system. However sometimes we may be interested in some value that can be +obtained from the materialized processing pipeline. For example, we want to know how many tweets we have processed. +While this question is not as obvious to give an answer to in case of an infinite stream of tweets (one way to answer +this question in a streaming setting would to create a stream of counts described as "*up until now*, we've processed N tweets"), +but in general it is possible to deal with finite streams and come up with a nice result such as a total count of elements. + +First, let's write such an element counter using :class:`FoldSink` and then we'll see how it is possible to obtain materialized +values from a :class:`MaterializedMap` which is returned by materializing an Akka stream. We'll split execution into multiple +lines for the sake of explaining the concepts of ``Materializable`` elements and ``MaterializedType`` + +.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#tweets-fold-count + +First, we prepare the :class:`FoldSink` which will be used to sum all ``Int`` elements of the stream. +Next we connect the ``tweets`` stream though a ``map`` step which converts each tweet into the number ``1``, +finally we connect the flow ``to`` the previously prepared Sink. Notice that this step does *not* yet materialize the +processing pipeline, it merely prepares the description of the Flow, which is now connected to a Sink, and therefore can +be ``run()``, as indicated by its type: :class:`RunnableFlow`. Next we call ``run()`` which uses the implicit :class:`FlowMaterializer` +to materialize and run the flow. The value returned by calling ``run()`` on a ``RunnableFlow`` or ``FlowGraph`` is ``MaterializedMap``, +which can be used to retrieve materialized values from the running stream. + +In order to extract an materialized value from a running stream it is possible to call ``get(Materializable)`` on a materialized map +obtained from materializing a flow or graph. Since ``FoldSink`` implements ``Materializable`` and implements the ``MaterializedType`` +as ``Future[Int]`` we can use it to obtain the :class:`Future` which when completed will contain the total length of our tweets stream. +In case of the stream failing, this future would complete with a Failure. + +The reason we have to ``get`` the value out from the materialized map, is because a :class:`RunnableFlow` may be reused +and materialized multiple times, because it is just the "blueprint" of the stream. This means that if we materialize a stream, +for example one that consumes a live stream of tweets within a minute, the materialized values for those two materializations +will be different, as illustrated by this example: + +.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#tweets-runnable-flow-materialized-twice + +Many elements in Akka Streams provide materialized values which can be used for obtaining either results of computation or +steering these elements which will be discussed in detail in :ref:`stream-materialization-scala`. Summing up this section, now we know +what happens behind the scenes when we run this one-liner, which is equivalent to the multi line version above: + +.. includecode:: code/docs/stream/TwitterStreamQuickstartDocSpec.scala#tweets-fold-count-oneline diff --git a/akka-docs-dev/rst/scala.rst b/akka-docs-dev/rst/scala.rst index 599a18707d..15375ba129 100644 --- a/akka-docs-dev/rst/scala.rst +++ b/akka-docs-dev/rst/scala.rst @@ -1,4 +1,4 @@ -.. _scala-api: +.. _stream-scala-api: Scala Documentation ===================