254 lines
11 KiB
HTML
254 lines
11 KiB
HTML
<html>
|
|
<head>
|
|
<title>Actor Supervision Java with Lambda Support</title>
|
|
</head>
|
|
<body>
|
|
<div>
|
|
<h2>Quick Overview</h2>
|
|
|
|
<p>Congratulations! You have just created your first fault-resilient Akka
|
|
application, nice job!</p>
|
|
|
|
<p>Let's start with an overview and discuss the problem we want
|
|
to solve. This tutorial application demonstrates the use of Akka
|
|
supervision hierarchies to implement reliable systems. This particular
|
|
example demonstrates a calculator service that calculates arithmetic
|
|
expressions. We will visit each of the components shortly, but you might
|
|
want to take a quick look at the components before we move on.</p>
|
|
|
|
<ul>
|
|
<li><a href="#code/src/main/java/supervision/Expression.java" class="shortcut">Expression.java</a>
|
|
contains our "domain model", a very simple representation of
|
|
arithmetic expressions
|
|
</li>
|
|
<li><a href="#code/src/main/java/supervision/ArithmeticService.java"
|
|
class="shortcut">ArithmeticService.java</a> is the entry point
|
|
for our calculation service
|
|
</li>
|
|
<li><a href="#code/src/main/java/supervision/FlakyExpressionCalculator.java"
|
|
class="shortcut">FlakyExpressionCalculator.java</a> is our
|
|
heavy-lifter, a worker actor that can evaluate an expression
|
|
concurrently
|
|
</li>
|
|
<li><a href="#code/src/main/java/supervision/Main.java" class="shortcut">Main.java</a>
|
|
example code that starts up the calculator service and sends a few
|
|
jobs to it
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
<div>
|
|
<h2>The Expression Model</h2>
|
|
|
|
<p>Our service deals with arithmetic expressions on integers involving
|
|
addition, multiplication and (integer) division. In
|
|
<a href="#code/src/main/java/supervision/Expression.java" class="shortcut">Expression.java</a>
|
|
you can see a very simple model of these kind of expressions.</p>
|
|
|
|
<p>Any arithmetic expression is a descendant of <code>Expression</code>, and
|
|
have a left and right side (<code>Const</code> is the only exception)
|
|
which is also an <code>Expression</code>.</p>
|
|
|
|
<p>For example, the expression (3 + 5) / (2 * (1 + 1)) could be constructed
|
|
as:</p>
|
|
|
|
|
|
<code><pre>
|
|
new Divide(
|
|
new Add(
|
|
new Const(3),
|
|
new Const(5)
|
|
), // (3 + 5)
|
|
new Multiply(
|
|
new Const(2),
|
|
new Add(
|
|
new Const(1),
|
|
new Const(1)
|
|
) // (1 + 1)
|
|
) // (2 * (1 + 1))
|
|
); // (3 + 5) / (2 * (1 + 1))
|
|
</pre></code>
|
|
|
|
<p>Apart from the encoding of an expression and some pretty printing, our
|
|
model does not provide other services, so lets move on, and see how we
|
|
can calculate the result of such expressions.</p>
|
|
</div>
|
|
<div>
|
|
<h2>Arithmetic Service</h2>
|
|
|
|
<p>Our entry point is the <a
|
|
href="#code/src/main/java/supervision/ArithmeticService.java"
|
|
class="shortcut">ArithmeticService</a> actor that accepts arithmetic
|
|
expressions, calculates them and returns the result to the original
|
|
sender of the <code>Expression</code>.This
|
|
logic is implemented in the <code>receive</code> block. The actor
|
|
handles <code>Expression</code> messages and starts a worker for them,
|
|
carefully recording which worker belongs to which requester in the
|
|
<code>pendingWorkers</code> map.
|
|
</p>
|
|
|
|
<p>Who calculates
|
|
the expression? As you see, on the reception of an
|
|
<code>Expression</code> message we create a <code>FlakyExpressionCalculator</code>
|
|
actor and pass the expression as a parameter to its <code>Props</code>.
|
|
What happens here is that we delegate the calculation work to a worker
|
|
actor because the work can be "dangerous". After the worker
|
|
finishes its job, it replies to its parent (in this
|
|
case <code>ArithmeticService</code>) with a <code>Result</code>
|
|
message. At this point the top level service actor looks up which
|
|
actor it needs to send the final result to, and forwards it the value
|
|
of the computation.</p>
|
|
</div>
|
|
|
|
<div>
|
|
<h2>The Dangers of Arithmetic</h2>
|
|
|
|
<p>At first, it might feel strange that we don't calculate the result
|
|
directly but we delegate it to a new actor. The reason for that, is that
|
|
we want to treat the calculation as a dangerous task and isolate its
|
|
execution in a different actor to keep the top level service safe.</p>
|
|
|
|
<p>In our example we will see two kinds of failures</p>
|
|
<ul>
|
|
<li><code>FlakinessException</code> is a dummy exception that we throw
|
|
randomly to simulate transient failures. We will assume that
|
|
flakiness is temporary, and retrying the calculation is enough to
|
|
eventually get rid of the failure.
|
|
</li>
|
|
<li>Fatal failures, like <code>ArithmeticException</code> that will not
|
|
go away no matter how many times we retry the task. Division by zero
|
|
is a good example, since it indicates that the expression is
|
|
invalid, and no amount of attempts to calculate it again will fix
|
|
it.
|
|
</li>
|
|
</ul>
|
|
|
|
<p>To handle these kind of failure modes differently we customized the
|
|
supervisor strategy of <a
|
|
href="#code/src/main/java/supervision/ArithmeticService.java"
|
|
class="shortcut">ArithmeticService</a>. Our strategy here
|
|
is to restart the child when a recoverable error is detected (in our
|
|
case the dummy <code>FlakinessException</code>), but when arithmetic
|
|
errors happen — like division by zero — we have no hope to recover
|
|
and therefore we stop the worker. In addition,
|
|
we have to notify the original requester of the calculation job
|
|
about the failure.</p>
|
|
|
|
<p>We used <code>OneForOneStrategy</code>, since we only want to act on the
|
|
failing child, not on all of our children at the same time.</p>
|
|
|
|
<p>We set <code>loggingEnabled</code> to false, since we wanted to use our
|
|
custom logging instead of the built-in reporting.</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
<div>
|
|
|
|
<h2>The Joy of Calculation</h2>
|
|
|
|
<p>We have now seen our <code>Expression</code> model, our fault modes
|
|
and how we deal with them at the top level, delegating the dangerous
|
|
work to child workers to isolate the failure, and setting
|
|
<code>Stop</code> or <code>Restart</code> directives depending on the
|
|
nature of the failure (fatal or transient). Now it's time to
|
|
calculate and visit <a href="#code/src/main/java/supervision/FlakyExpressionCalculator.java"
|
|
class="shortcut">FlakyExpressionCalculator.java</a>!
|
|
</p>
|
|
|
|
<p>Let's review first our evaluation strategy. When we are facing an
|
|
expression like ((4 * 4) / (3 + 1)) we might be tempted to calculate (4
|
|
* 4) first, then (3 + 1), and then the final division. We can do better:
|
|
Let's calculate the two sides of the division in parallel!</p>
|
|
|
|
<p>To achieve this, our worker delegates the calculation of the left and
|
|
right side of the expression it has been given to two child workers of
|
|
the same type (except in the case of constant, where it just sends its
|
|
value as <code>Result</code> to its parent.
|
|
This logic is in <code>preStart()</code>
|
|
since this is the code that will be executed when an actor starts (and
|
|
during restarts if the <code>postRestart()</code> is not
|
|
overridden).</p>
|
|
|
|
<p>Since any of the sides of the original expression can finish before the
|
|
other, we have to indicate somehow which side has been calculated, that
|
|
is why we pass a <code>Position</code> as an argument to workers which
|
|
they will put in their <code>Result</code> which they send after the
|
|
calculation finished successfully.</p>
|
|
|
|
</div>
|
|
<div>
|
|
|
|
|
|
<h2>Failing Calculations</h2>
|
|
|
|
<p>As you might have observed, we added a method called
|
|
<code>flakiness()</code> that sometimes just misbehaves
|
|
(throws a <code>FlakinessException</code>).
|
|
This simulates a transient failure. Let's see how our
|
|
FlakyExpressionCalculator deals with failure situations.</p>
|
|
|
|
<p>A supervisor strategy is applied to the children of an actor. Since our
|
|
children are actually workers for calculating the left and right side of
|
|
our subexpression, we have to think what different failures mean for
|
|
us.</p>
|
|
|
|
<p>If we encounter a <code>FlakinessException</code> it indicates that one
|
|
of our workers
|
|
just made a hiccup and failed to calculate the answer. Since we know
|
|
this failure is recoverable, we just restart the responsible worker.</p>
|
|
|
|
<p>In case of fatal failures we cannot really do anything ourselves. First
|
|
of all, it indicates that the expression is invalid so restart does not
|
|
help, second, we are not necessarily the top level worker for the
|
|
expression. When an unknown failure is encountered it
|
|
is escalated to the parent. The parent of this actor is either another
|
|
<code>FlakyExpressionCalculator</code> or the
|
|
<code>ArithmeticService </code>. Since the calculators all escalate, no
|
|
matter how deep the failure happened, the <code>ArithmeticService</code>
|
|
will decide on the fate of the job (in our case, stop it).</p>
|
|
</div>
|
|
<div>
|
|
<h2>When to Split Work? A Small Detour.</h2>
|
|
|
|
<p>In our example we split expressions recursively and calculated the left
|
|
and right sides of each of the expressions. The question naturally
|
|
arises: do we gain anything here regarding performance?</p>
|
|
|
|
<p>In this example more probably not. There is an additional overhead of
|
|
splitting up tasks and collecting results, and this case the actual
|
|
subtasks consist of simple arithmetic operations which are very fast.
|
|
To really gain in performance in practice, the actual subtasks have to
|
|
be more heavyweight than this — but the pattern will be the
|
|
same.</p>
|
|
|
|
</div>
|
|
<div>
|
|
<h2>Where to go from here?</h2>
|
|
|
|
<p>After getting comfortable with the code, you can test your
|
|
understanding by trying to solve the following small exercises:</p>
|
|
<ul>
|
|
<li>Add <code>flakiness()</code> to various places in the calculator and
|
|
see what happens
|
|
</li>
|
|
<li>Try devising more calculation intensive nested jobs instead of
|
|
arithmetic expressions (for example transformations of a text
|
|
document) where parallelism improves performance
|
|
</li>
|
|
</ul>
|
|
|
|
<p>You should also visit</p>
|
|
<ul>
|
|
<li><a href="http://doc.akka.io/docs/akka/2.4-SNAPSHOT/java.html"
|
|
target="_blank">The Akka documentation</a></li>
|
|
<li><a
|
|
href="http://doc.akka.io/docs/akka/2.4-SNAPSHOT/java/lambda-fault-tolerance.html"
|
|
target="_blank">Documentation of supervision</a></li>
|
|
<li><a href="http://letitcrash.com" target="_blank">The Akka Team blog</a></li>
|
|
</ul>
|
|
</div>
|
|
|
|
|
|
</body>
|
|
</html>
|