2012-08-27 07:47:34 +02:00
|
|
|
|
|
2012-10-04 14:12:48 +02:00
|
|
|
|
.. _cluster_usage_java:
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2012-10-04 14:12:48 +02:00
|
|
|
|
######################
|
2013-04-19 13:21:15 +02:00
|
|
|
|
Cluster Usage
|
2012-10-04 14:12:48 +02:00
|
|
|
|
######################
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
|
|
|
|
|
For introduction to the Akka Cluster concepts please see :ref:`cluster`.
|
|
|
|
|
|
|
|
|
|
|
|
Preparing Your Project for Clustering
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
2012-10-04 14:12:48 +02:00
|
|
|
|
The Akka cluster is a separate jar file. Make sure that you have the following dependency in your project::
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2012-10-04 14:12:48 +02:00
|
|
|
|
<dependency>
|
|
|
|
|
|
<groupId>com.typesafe.akka</groupId>
|
2013-05-23 15:18:00 +02:00
|
|
|
|
<artifactId>akka-cluster_@binVersion@</artifactId>
|
2012-10-04 14:12:48 +02:00
|
|
|
|
<version>@version@</version>
|
|
|
|
|
|
</dependency>
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2012-10-16 10:54:21 +02:00
|
|
|
|
.. _cluster_simple_example_java:
|
|
|
|
|
|
|
2012-08-27 07:47:34 +02:00
|
|
|
|
A Simple Cluster Example
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
The following configuration enables the ``Cluster`` extension to be used.
|
|
|
|
|
|
It joins the cluster and an actor subscribes to cluster membership events and logs them.
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
The ``application.conf`` configuration looks like this:
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/resources/application.conf
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2012-10-04 14:12:48 +02:00
|
|
|
|
To enable cluster capabilities in your Akka project you should, at a minimum, add the :ref:`remoting-java`
|
2012-09-20 13:02:24 +02:00
|
|
|
|
settings, but with ``akka.cluster.ClusterActorRefProvider``.
|
2012-12-04 17:54:17 +01:00
|
|
|
|
The ``akka.cluster.seed-nodes`` should normally also be added to your ``application.conf`` file.
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2015-09-09 10:14:51 +02:00
|
|
|
|
.. note::
|
|
|
|
|
|
If you are using Docker or the nodes for some other reason have separate internal and external ip addresses
|
|
|
|
|
|
you must configure remoting according to :ref:`remote-configuration-nat-java`
|
|
|
|
|
|
|
2012-08-27 07:47:34 +02:00
|
|
|
|
The seed nodes are configured contact points for initial, automatic, join of the cluster.
|
|
|
|
|
|
|
2012-08-31 12:27:17 +02:00
|
|
|
|
Note that if you are going to start the nodes on different machines you need to specify the
|
2012-08-27 07:47:34 +02:00
|
|
|
|
ip-addresses or host names of the machines in ``application.conf`` instead of ``127.0.0.1``
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
An actor that uses the cluster extension may look like this:
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. literalinclude:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/simple/SimpleClusterListener.java
|
2012-10-04 14:12:48 +02:00
|
|
|
|
:language: java
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2014-01-08 14:14:48 +01:00
|
|
|
|
The actor registers itself as subscriber of certain cluster events. It receives events corresponding to the current state
|
|
|
|
|
|
of the cluster when the subscription starts and then it receives events for changes that happen in the cluster.
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2014-03-13 08:54:41 +01:00
|
|
|
|
The easiest way to run this example yourself is to download `Typesafe Activator <http://www.typesafe.com/platform/getstarted>`_
|
|
|
|
|
|
and open the tutorial named `Akka Cluster Samples with Java <http://www.typesafe.com/activator/template/akka-sample-cluster-java>`_.
|
2014-01-08 14:14:48 +01:00
|
|
|
|
It contains instructions of how to run the ``SimpleClusterApp``.
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2013-05-16 17:00:04 +02:00
|
|
|
|
Joining to Seed Nodes
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2013-05-16 17:00:04 +02:00
|
|
|
|
You may decide if joining to the cluster should be done manually or automatically
|
|
|
|
|
|
to configured initial contact points, so-called seed nodes. When a new node is started
|
|
|
|
|
|
it sends a message to all seed nodes and then sends join command to the one that
|
2012-08-27 07:47:34 +02:00
|
|
|
|
answers first. If no one of the seed nodes replied (might not be started yet)
|
|
|
|
|
|
it retries this procedure until successful or shutdown.
|
|
|
|
|
|
|
2013-05-21 17:52:04 +02:00
|
|
|
|
You define the seed nodes in the :ref:`cluster_configuration_java` file (application.conf)::
|
|
|
|
|
|
|
|
|
|
|
|
akka.cluster.seed-nodes = [
|
|
|
|
|
|
"akka.tcp://ClusterSystem@host1:2552",
|
|
|
|
|
|
"akka.tcp://ClusterSystem@host2:2552"]
|
|
|
|
|
|
|
|
|
|
|
|
This can also be defined as Java system properties when starting the JVM using the following syntax::
|
|
|
|
|
|
|
|
|
|
|
|
-Dakka.cluster.seed-nodes.0=akka.tcp://ClusterSystem@host1:2552
|
|
|
|
|
|
-Dakka.cluster.seed-nodes.1=akka.tcp://ClusterSystem@host2:2552
|
|
|
|
|
|
|
2012-08-27 07:47:34 +02:00
|
|
|
|
The seed nodes can be started in any order and it is not necessary to have all
|
2013-02-17 17:35:43 +01:00
|
|
|
|
seed nodes running, but the node configured as the first element in the ``seed-nodes``
|
|
|
|
|
|
configuration list must be started when initially starting a cluster, otherwise the
|
|
|
|
|
|
other seed-nodes will not become initialized and no other node can join the cluster.
|
2013-10-14 17:46:55 +02:00
|
|
|
|
The reason for the special first seed node is to avoid forming separated islands when
|
|
|
|
|
|
starting from an empty cluster.
|
2013-02-17 17:35:43 +01:00
|
|
|
|
It is quickest to start all configured seed nodes at the same time (order doesn't matter),
|
|
|
|
|
|
otherwise it can take up to the configured ``seed-node-timeout`` until the nodes
|
|
|
|
|
|
can join.
|
|
|
|
|
|
|
|
|
|
|
|
Once more than two seed nodes have been started it is no problem to shut down the first
|
2014-08-05 14:16:46 +02:00
|
|
|
|
seed node. If the first seed node is restarted, it will first try to join the other
|
2013-02-17 17:35:43 +01:00
|
|
|
|
seed nodes in the existing cluster.
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2014-08-05 14:16:46 +02:00
|
|
|
|
If you don't configure seed nodes you need to join the cluster programmatically or manually.
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2014-08-05 14:16:46 +02:00
|
|
|
|
Manual joining can be performed by using ref:`cluster_jmx_java` or :ref:`cluster_command_line_java`.
|
2015-05-23 19:16:41 -07:00
|
|
|
|
Joining programmatically can be performed with ``Cluster.get(system).join``. Unsuccessful join attempts are
|
2015-05-04 08:35:46 +02:00
|
|
|
|
automatically retried after the time period defined in configuration property ``retry-unsuccessful-join-after``.
|
|
|
|
|
|
Retries can be disabled by setting the property to ``off``.
|
2014-08-05 14:16:46 +02:00
|
|
|
|
|
|
|
|
|
|
You can join to any node in the cluster. It does not have to be configured as a seed node.
|
|
|
|
|
|
Note that you can only join to an existing cluster member, which means that for bootstrapping some
|
2015-04-17 17:28:37 +08:00
|
|
|
|
node must join itself,and then the following nodes could join them to make up a cluster.
|
2013-10-14 17:46:55 +02:00
|
|
|
|
|
2014-08-05 14:16:46 +02:00
|
|
|
|
You may also use ``Cluster.get(system).joinSeedNodes`` to join programmatically,
|
|
|
|
|
|
which is attractive when dynamically discovering other nodes at startup by using some external tool or API.
|
|
|
|
|
|
When using ``joinSeedNodes`` you should not include the node itself except for the node that is
|
|
|
|
|
|
supposed to be the first seed node, and that should be placed first in parameter to ``joinSeedNodes``.
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2015-05-04 08:35:46 +02:00
|
|
|
|
Unsuccessful attempts to contact seed nodes are automatically retried after the time period defined in
|
|
|
|
|
|
configuration property ``seed-node-timeout``. Unsuccessful attempt to join a specific seed node is
|
2015-09-15 16:18:24 +02:00
|
|
|
|
automatically retried after the configured ``retry-unsuccessful-join-after``. Retrying means that it
|
2015-05-04 08:35:46 +02:00
|
|
|
|
tries to contact all seed nodes and then joins the node that answers first. The first node in the list
|
|
|
|
|
|
of seed nodes will join itself if it cannot contact any of the other seed nodes within the
|
|
|
|
|
|
configured ``seed-node-timeout``.
|
2013-04-11 09:18:12 +02:00
|
|
|
|
|
|
|
|
|
|
An actor system can only join a cluster once. Additional attempts will be ignored.
|
|
|
|
|
|
When it has successfully joined it must be restarted to be able to join another
|
|
|
|
|
|
cluster or to join the same cluster again. It can use the same host name and port
|
2015-04-17 17:28:37 +08:00
|
|
|
|
after the restart, when it come up as new incarnation of existing member in the cluster,
|
2015-05-26 09:00:40 +02:00
|
|
|
|
trying to join in, then the existing one will be removed from the cluster and then it will
|
|
|
|
|
|
be allowed to join.
|
2015-04-17 17:28:37 +08:00
|
|
|
|
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2014-04-10 12:52:57 +02:00
|
|
|
|
.. _automatic-vs-manual-downing-java:
|
|
|
|
|
|
|
2012-08-27 07:47:34 +02:00
|
|
|
|
Automatic vs. Manual Downing
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
When a member is considered by the failure detector to be unreachable the
|
|
|
|
|
|
leader is not allowed to perform its duties, such as changing status of
|
2013-08-27 15:14:53 +02:00
|
|
|
|
new joining members to 'Up'. The node must first become reachable again, or the
|
|
|
|
|
|
status of the unreachable member must be changed to 'Down'. Changing status to 'Down'
|
|
|
|
|
|
can be performed automatically or manually. By default it must be done manually, using
|
|
|
|
|
|
:ref:`cluster_jmx_java` or :ref:`cluster_command_line_java`.
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2015-05-23 19:16:41 -07:00
|
|
|
|
It can also be performed programmatically with ``Cluster.get(system).down(address)``.
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2012-09-23 12:55:31 +02:00
|
|
|
|
You can enable automatic downing with configuration::
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2013-09-11 16:09:51 +02:00
|
|
|
|
akka.cluster.auto-down-unreachable-after = 120s
|
|
|
|
|
|
|
|
|
|
|
|
This means that the cluster leader member will change the ``unreachable`` node
|
|
|
|
|
|
status to ``down`` automatically after the configured time of unreachability.
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2012-09-23 12:55:31 +02:00
|
|
|
|
Be aware of that using auto-down implies that two separate clusters will
|
|
|
|
|
|
automatically be formed in case of network partition. That might be
|
|
|
|
|
|
desired by some applications but not by others.
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2013-09-11 16:09:51 +02:00
|
|
|
|
.. note:: If you have *auto-down* enabled and the failure detector triggers, you
|
|
|
|
|
|
can over time end up with a lot of single node clusters if you don't put
|
|
|
|
|
|
measures in place to shut down nodes that have become ``unreachable``. This
|
|
|
|
|
|
follows from the fact that the ``unreachable`` node will likely see the rest of
|
|
|
|
|
|
the cluster as ``unreachable``, become its own leader and form its own cluster.
|
|
|
|
|
|
|
2013-05-09 09:49:59 +02:00
|
|
|
|
Leaving
|
|
|
|
|
|
^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
There are two ways to remove a member from the cluster.
|
|
|
|
|
|
|
|
|
|
|
|
You can just stop the actor system (or the JVM process). It will be detected
|
|
|
|
|
|
as unreachable and removed after the automatic or manual downing as described
|
|
|
|
|
|
above.
|
|
|
|
|
|
|
|
|
|
|
|
A more graceful exit can be performed if you tell the cluster that a node shall leave.
|
|
|
|
|
|
This can be performed using :ref:`cluster_jmx_java` or :ref:`cluster_command_line_java`.
|
2015-09-04 08:53:36 +02:00
|
|
|
|
It can also be performed programmatically with:
|
|
|
|
|
|
|
|
|
|
|
|
.. includecode:: code/docs/cluster/ClusterDocTest.java#leave
|
2013-05-09 09:49:59 +02:00
|
|
|
|
|
|
|
|
|
|
Note that this command can be issued to any member in the cluster, not necessarily the
|
|
|
|
|
|
one that is leaving. The cluster extension, but not the actor system or JVM, of the
|
|
|
|
|
|
leaving member will be shutdown after the leader has changed status of the member to
|
|
|
|
|
|
`Exiting`. Thereafter the member will be removed from the cluster. Normally this is handled
|
|
|
|
|
|
automatically, but in case of network failures during this process it might still be necessary
|
|
|
|
|
|
to set the node’s status to ``Down`` in order to complete the removal.
|
|
|
|
|
|
|
2015-09-04 12:38:49 +02:00
|
|
|
|
.. _weakly_up_java:
|
|
|
|
|
|
|
2015-08-25 17:20:05 -05:00
|
|
|
|
WeaklyUp Members
|
|
|
|
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
If a node is ``unreachable`` then gossip convergence is not possible and therefore any
|
|
|
|
|
|
``leader`` actions are also not possible. However, we still might want new nodes to join
|
|
|
|
|
|
the cluster in this scenario.
|
|
|
|
|
|
|
2015-09-04 12:38:49 +02:00
|
|
|
|
.. warning::
|
|
|
|
|
|
|
|
|
|
|
|
The WeaklyUp feature is marked as **“experimental”** as of its introduction in Akka 2.4.0. We will continue to
|
|
|
|
|
|
improve this feature based on our users’ feedback, which implies that while we try to keep incompatible
|
|
|
|
|
|
changes to a minimum the binary compatibility guarantee for maintenance releases does not apply this feature.
|
|
|
|
|
|
|
|
|
|
|
|
This feature is disabled by default. With a configuration option you can allow this behavior::
|
2015-08-25 17:20:05 -05:00
|
|
|
|
|
|
|
|
|
|
akka.cluster.allow-weakly-up-members = on
|
|
|
|
|
|
|
|
|
|
|
|
When ``allow-weakly-up-members`` is enabled and there is no gossip convergence,
|
|
|
|
|
|
``Joining`` members will be promoted to ``WeaklyUp`` and they will become part of the
|
|
|
|
|
|
cluster. Once gossip convergence is reached, the leader will move ``WeaklyUp``
|
|
|
|
|
|
members to ``Up``.
|
|
|
|
|
|
|
2015-09-04 12:38:49 +02:00
|
|
|
|
You can subscribe to the ``WeaklyUp`` membership event to make use of the members that are
|
|
|
|
|
|
in this state, but you should be aware of that members on the other side of a network partition
|
|
|
|
|
|
have no knowledge about the existence of the new members. You should for example not count
|
|
|
|
|
|
``WeaklyUp`` members in quorum decisions.
|
|
|
|
|
|
|
2015-08-25 17:20:05 -05:00
|
|
|
|
.. warning::
|
|
|
|
|
|
|
|
|
|
|
|
This feature is only available from Akka 2.4.0 and cannot be used if some of your
|
|
|
|
|
|
cluster members are running an older version of Akka.
|
|
|
|
|
|
|
2015-09-04 12:38:49 +02:00
|
|
|
|
|
2012-10-04 14:12:48 +02:00
|
|
|
|
.. _cluster_subscriber_java:
|
2012-09-21 11:47:50 +02:00
|
|
|
|
|
2012-09-13 10:54:14 +02:00
|
|
|
|
Subscribe to Cluster Events
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
You can subscribe to change notifications of the cluster membership by using
|
2014-01-08 14:14:48 +01:00
|
|
|
|
``Cluster.get(system).subscribe``.
|
|
|
|
|
|
|
|
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/simple/SimpleClusterListener2.java#subscribe
|
|
|
|
|
|
|
|
|
|
|
|
A snapshot of the full state, ``akka.cluster.ClusterEvent.CurrentClusterState``, is sent to the subscriber
|
|
|
|
|
|
as the first message, followed by events for incremental updates.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2013-07-01 16:02:45 +02:00
|
|
|
|
Note that you may receive an empty ``CurrentClusterState``, containing no members,
|
|
|
|
|
|
if you start the subscription before the initial join procedure has completed.
|
|
|
|
|
|
This is expected behavior. When the node has been accepted in the cluster you will
|
|
|
|
|
|
receive ``MemberUp`` for that node, and other nodes.
|
|
|
|
|
|
|
2014-01-08 14:14:48 +01:00
|
|
|
|
If you find it inconvenient to handle the ``CurrentClusterState`` you can use
|
|
|
|
|
|
``ClusterEvent.initialStateAsEvents()`` as parameter to ``subscribe``.
|
|
|
|
|
|
That means that instead of receiving ``CurrentClusterState`` as the first message you will receive
|
|
|
|
|
|
the events corresponding to the current state to mimic what you would have seen if you were
|
|
|
|
|
|
listening to the events when they occurred in the past. Note that those initial events only correspond
|
|
|
|
|
|
to the current state and it is not the full history of all changes that actually has occurred in the cluster.
|
|
|
|
|
|
|
|
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/simple/SimpleClusterListener.java#subscribe
|
|
|
|
|
|
|
2013-05-09 09:49:59 +02:00
|
|
|
|
The events to track the life-cycle of members are:
|
|
|
|
|
|
|
|
|
|
|
|
* ``ClusterEvent.MemberUp`` - A new member has joined the cluster and its status has been changed to ``Up``.
|
2013-08-27 15:14:53 +02:00
|
|
|
|
* ``ClusterEvent.MemberExited`` - A member is leaving the cluster and its status has been changed to ``Exiting``
|
2013-05-09 09:49:59 +02:00
|
|
|
|
Note that the node might already have been shutdown when this event is published on another node.
|
|
|
|
|
|
* ``ClusterEvent.MemberRemoved`` - Member completely removed from the cluster.
|
2013-08-27 15:14:53 +02:00
|
|
|
|
* ``ClusterEvent.UnreachableMember`` - A member is considered as unreachable, detected by the failure detector
|
|
|
|
|
|
of at least one other node.
|
|
|
|
|
|
* ``ClusterEvent.ReachableMember`` - A member is considered as reachable again, after having been unreachable.
|
|
|
|
|
|
All nodes that previously detected it as unreachable has detected it as reachable again.
|
2013-05-09 09:49:59 +02:00
|
|
|
|
|
|
|
|
|
|
There are more types of change events, consult the API documentation
|
2012-09-23 12:55:31 +02:00
|
|
|
|
of classes that extends ``akka.cluster.ClusterEvent.ClusterDomainEvent``
|
2012-09-13 10:54:14 +02:00
|
|
|
|
for details about the events.
|
|
|
|
|
|
|
2014-01-08 14:14:48 +01:00
|
|
|
|
Instead of subscribing to cluster events it can sometimes be convenient to only get the full membership state with
|
|
|
|
|
|
``Cluster.get(system).state()``. Note that this state is not necessarily in sync with the events published to a
|
|
|
|
|
|
cluster subscription.
|
|
|
|
|
|
|
2012-09-13 10:54:14 +02:00
|
|
|
|
Worker Dial-in Example
|
|
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
|
|
|
|
Let's take a look at an example that illustrates how workers, here named *backend*,
|
|
|
|
|
|
can detect and register to new master nodes, here named *frontend*.
|
|
|
|
|
|
|
|
|
|
|
|
The example application provides a service to transform text. When some text
|
|
|
|
|
|
is sent to one of the frontend services, it will be delegated to one of the
|
|
|
|
|
|
backend workers, which performs the transformation job, and sends the result back to
|
|
|
|
|
|
the original client. New backend nodes, as well as new frontend nodes, can be
|
|
|
|
|
|
added or removed to the cluster dynamically.
|
|
|
|
|
|
|
|
|
|
|
|
Messages:
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/transformation/TransformationMessages.java#messages
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
|
|
|
|
|
The backend worker that performs the transformation job:
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/transformation/TransformationBackend.java#backend
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
|
|
|
|
|
Note that the ``TransformationBackend`` actor subscribes to cluster events to detect new,
|
|
|
|
|
|
potential, frontend nodes, and send them a registration message so that they know
|
|
|
|
|
|
that they can use the backend worker.
|
|
|
|
|
|
|
|
|
|
|
|
The frontend that receives user jobs and delegates to one of the registered backend workers:
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/transformation/TransformationFrontend.java#frontend
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
|
|
|
|
|
Note that the ``TransformationFrontend`` actor watch the registered backend
|
2013-11-29 16:27:23 +01:00
|
|
|
|
to be able to remove it from its list of available backend workers.
|
2012-09-23 12:55:31 +02:00
|
|
|
|
Death watch uses the cluster failure detector for nodes in the cluster, i.e. it detects
|
2012-09-13 10:54:14 +02:00
|
|
|
|
network failures and JVM crashes, in addition to graceful termination of watched
|
2014-02-04 15:51:08 +01:00
|
|
|
|
actor. Death watch generates the ``Terminated`` message to the watching actor when the
|
|
|
|
|
|
unreachable cluster node has been downed and removed.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2014-03-13 08:54:41 +01:00
|
|
|
|
The `Typesafe Activator <http://www.typesafe.com/platform/getstarted>`_ tutorial named
|
|
|
|
|
|
`Akka Cluster Samples with Java <http://www.typesafe.com/activator/template/akka-sample-cluster-java>`_.
|
2013-11-29 16:27:23 +01:00
|
|
|
|
contains the full source code and instructions of how to run the **Worker Dial-in Example**.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2013-03-14 20:32:43 +01:00
|
|
|
|
Node Roles
|
|
|
|
|
|
^^^^^^^^^^
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2013-03-14 20:32:43 +01:00
|
|
|
|
Not all nodes of a cluster need to perform the same function: there might be one sub-set which runs the web front-end,
|
|
|
|
|
|
one which runs the data access layer and one for the number-crunching. Deployment of actors—for example by cluster-aware
|
|
|
|
|
|
routers—can take node roles into account to achieve this distribution of responsibilities.
|
|
|
|
|
|
|
|
|
|
|
|
The roles of a node is defined in the configuration property named ``akka.cluster.roles``
|
|
|
|
|
|
and it is typically defined in the start script as a system property or environment variable.
|
|
|
|
|
|
|
|
|
|
|
|
The roles of the nodes is part of the membership information in ``MemberEvent`` that you can subscribe to.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2012-12-10 08:46:25 +01:00
|
|
|
|
How To Startup when Cluster Size Reached
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
A common use case is to start actors after the cluster has been initialized,
|
|
|
|
|
|
members have joined, and the cluster has reached a certain size.
|
|
|
|
|
|
|
|
|
|
|
|
With a configuration option you can define required number of members
|
|
|
|
|
|
before the leader changes member status of 'Joining' members to 'Up'.
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/resources/factorial.conf#min-nr-of-members
|
2012-12-10 08:46:25 +01:00
|
|
|
|
|
2013-03-14 20:32:43 +01:00
|
|
|
|
In a similar way you can define required number of members of a certain role
|
|
|
|
|
|
before the leader changes member status of 'Joining' members to 'Up'.
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/resources/factorial.conf#role-min-nr-of-members
|
2013-03-14 20:32:43 +01:00
|
|
|
|
|
2012-12-10 08:46:25 +01:00
|
|
|
|
You can start the actors in a ``registerOnMemberUp`` callback, which will
|
2015-04-17 17:28:37 +08:00
|
|
|
|
be invoked when the current member status is changed to 'Up', i.e. the cluster
|
2012-12-10 08:46:25 +01:00
|
|
|
|
has at least the defined number of members.
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/factorial/FactorialFrontendMain.java#registerOnUp
|
2012-12-10 08:46:25 +01:00
|
|
|
|
|
|
|
|
|
|
This callback can be used for other things than starting actors.
|
|
|
|
|
|
|
2015-05-26 09:00:40 +02:00
|
|
|
|
How To Cleanup when Member is Removed
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
2015-04-17 17:28:37 +08:00
|
|
|
|
You can do some clean up in a ``registerOnMemberRemoved`` callback, which will
|
2015-09-10 08:48:50 +02:00
|
|
|
|
be invoked when the current member status is changed to 'Removed' or the cluster have been shutdown.
|
2015-04-17 17:28:37 +08:00
|
|
|
|
|
2015-05-26 09:00:40 +02:00
|
|
|
|
For example, this is how to shut down the ``ActorSystem`` and thereafter exit the JVM:
|
|
|
|
|
|
|
2015-04-17 17:28:37 +08:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/factorial/FactorialFrontendMain.java#registerOnRemoved
|
|
|
|
|
|
|
2015-05-26 09:00:40 +02:00
|
|
|
|
.. note::
|
|
|
|
|
|
Register a OnMemberRemoved callback on a cluster that have been shutdown, the callback will be invoked immediately on
|
2015-02-16 08:27:05 +01:00
|
|
|
|
the caller thread, otherwise it will be invoked later when the current member status changed to 'Removed'. You may
|
2015-05-26 09:00:40 +02:00
|
|
|
|
want to install some cleanup handling after the cluster was started up, but the cluster might already be shutting
|
2015-04-17 17:28:37 +08:00
|
|
|
|
down when you installing, and depending on the race is not healthy.
|
|
|
|
|
|
|
2014-02-21 11:24:01 +01:00
|
|
|
|
Cluster Singleton
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^
|
2013-01-14 14:09:53 +01:00
|
|
|
|
|
|
|
|
|
|
For some use cases it is convenient and sometimes also mandatory to ensure that
|
|
|
|
|
|
you have exactly one actor of a certain type running somewhere in the cluster.
|
|
|
|
|
|
|
2013-04-28 22:05:40 +02:00
|
|
|
|
This can be implemented by subscribing to member events, but there are several corner
|
|
|
|
|
|
cases to consider. Therefore, this specific use case is made easily accessible by the
|
2015-06-30 13:58:15 +02:00
|
|
|
|
:ref:`cluster-singleton-java`.
|
2013-01-14 14:09:53 +01:00
|
|
|
|
|
2013-11-19 15:53:40 +01:00
|
|
|
|
Cluster Sharding
|
|
|
|
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
2014-02-21 11:24:01 +01:00
|
|
|
|
Distributes actors across several nodes in the cluster and supports interaction
|
|
|
|
|
|
with the actors using their logical identifier, but without having to care about
|
|
|
|
|
|
their physical location in the cluster.
|
2013-11-19 15:53:40 +01:00
|
|
|
|
|
2015-06-30 11:43:37 +02:00
|
|
|
|
See :ref:`cluster_sharding_java`.
|
2013-11-19 15:53:40 +01:00
|
|
|
|
|
2014-02-21 11:24:01 +01:00
|
|
|
|
Distributed Publish Subscribe
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
Publish-subscribe messaging between actors in the cluster, and point-to-point messaging
|
|
|
|
|
|
using the logical path of the actors, i.e. the sender does not have to know on which
|
|
|
|
|
|
node the destination actor is running.
|
2013-04-06 16:22:30 +02:00
|
|
|
|
|
2015-06-30 14:46:34 +02:00
|
|
|
|
See :ref:`distributed-pub-sub-scala`.
|
2013-04-06 16:22:30 +02:00
|
|
|
|
|
2013-04-14 22:30:09 +02:00
|
|
|
|
Cluster Client
|
|
|
|
|
|
^^^^^^^^^^^^^^
|
|
|
|
|
|
|
2014-02-21 11:24:01 +01:00
|
|
|
|
Communication from an actor system that is not part of the cluster to actors running
|
|
|
|
|
|
somewhere in the cluster. The client does not have to know on which node the destination
|
|
|
|
|
|
actor is running.
|
|
|
|
|
|
|
2015-06-30 16:22:33 +02:00
|
|
|
|
See :ref:`cluster-client-java`.
|
2013-04-14 22:30:09 +02:00
|
|
|
|
|
2015-06-30 11:43:37 +02:00
|
|
|
|
Distributed Data
|
|
|
|
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
|
|
|
|
|
*Akka Distributed Data* is useful when you need to share data between nodes in an
|
|
|
|
|
|
Akka Cluster. The data is accessed with an actor providing a key-value store like API.
|
|
|
|
|
|
|
|
|
|
|
|
See :ref:`distributed_data_java`.
|
2013-04-14 22:30:09 +02:00
|
|
|
|
|
2012-09-20 15:24:07 +02:00
|
|
|
|
Failure Detector
|
|
|
|
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
2013-08-27 15:14:53 +02:00
|
|
|
|
In a cluster each node is monitored by a few (default maximum 5) other nodes, and when
|
|
|
|
|
|
any of these detects the node as ``unreachable`` that information will spread to
|
|
|
|
|
|
the rest of the cluster through the gossip. In other words, only one node needs to
|
|
|
|
|
|
mark a node ``unreachable`` to have the rest of the cluster mark that node ``unreachable``.
|
|
|
|
|
|
|
|
|
|
|
|
The failure detector will also detect if the node becomes ``reachable`` again. When
|
|
|
|
|
|
all nodes that monitored the ``unreachable`` node detects it as ``reachable`` again
|
|
|
|
|
|
the cluster, after gossip dissemination, will consider it as ``reachable``.
|
|
|
|
|
|
|
|
|
|
|
|
If system messages cannot be delivered to a node it will be quarantined and then it
|
|
|
|
|
|
cannot come back from ``unreachable``. This can happen if the there are too many
|
|
|
|
|
|
unacknowledged system messages (e.g. watch, Terminated, remote actor deployment,
|
|
|
|
|
|
failures of actors supervised by remote parent). Then the node needs to be moved
|
|
|
|
|
|
to the ``down`` or ``removed`` states and the actor system must be restarted before
|
|
|
|
|
|
it can join the cluster again.
|
|
|
|
|
|
|
2012-09-20 15:24:07 +02:00
|
|
|
|
The nodes in the cluster monitor each other by sending heartbeats to detect if a node is
|
|
|
|
|
|
unreachable from the rest of the cluster. The heartbeat arrival times is interpreted
|
2012-09-23 12:55:31 +02:00
|
|
|
|
by an implementation of
|
|
|
|
|
|
`The Phi Accrual Failure Detector <http://ddg.jaist.ac.jp/pub/HDY+04.pdf>`_.
|
2012-09-21 16:23:55 +02:00
|
|
|
|
|
|
|
|
|
|
The suspicion level of failure is given by a value called *phi*.
|
|
|
|
|
|
The basic idea of the phi failure detector is to express the value of *phi* on a scale that
|
2012-09-23 12:55:31 +02:00
|
|
|
|
is dynamically adjusted to reflect current network conditions.
|
2012-09-21 16:23:55 +02:00
|
|
|
|
|
|
|
|
|
|
The value of *phi* is calculated as::
|
|
|
|
|
|
|
2012-10-05 08:08:25 +02:00
|
|
|
|
phi = -log10(1 - F(timeSinceLastHeartbeat))
|
2012-09-21 16:23:55 +02:00
|
|
|
|
|
|
|
|
|
|
where F is the cumulative distribution function of a normal distribution with mean
|
|
|
|
|
|
and standard deviation estimated from historical heartbeat inter-arrival times.
|
2012-09-20 15:24:07 +02:00
|
|
|
|
|
2012-10-04 14:12:48 +02:00
|
|
|
|
In the :ref:`cluster_configuration_java` you can adjust the ``akka.cluster.failure-detector.threshold``
|
2012-09-23 12:55:31 +02:00
|
|
|
|
to define when a *phi* value is considered to be a failure.
|
2012-09-21 16:23:55 +02:00
|
|
|
|
|
|
|
|
|
|
A low ``threshold`` is prone to generate many false positives but ensures
|
2012-09-20 15:24:07 +02:00
|
|
|
|
a quick detection in the event of a real crash. Conversely, a high ``threshold``
|
|
|
|
|
|
generates fewer mistakes but needs more time to detect actual crashes. The
|
|
|
|
|
|
default ``threshold`` is 8 and is appropriate for most situations. However in
|
|
|
|
|
|
cloud environments, such as Amazon EC2, the value could be increased to 12 in
|
|
|
|
|
|
order to account for network issues that sometimes occur on such platforms.
|
|
|
|
|
|
|
2012-09-23 12:55:31 +02:00
|
|
|
|
The following chart illustrates how *phi* increase with increasing time since the
|
|
|
|
|
|
previous heartbeat.
|
2012-09-20 15:24:07 +02:00
|
|
|
|
|
2013-04-19 13:21:15 +02:00
|
|
|
|
.. image:: ../images/phi1.png
|
2012-09-20 15:24:07 +02:00
|
|
|
|
|
|
|
|
|
|
Phi is calculated from the mean and standard deviation of historical
|
|
|
|
|
|
inter arrival times. The previous chart is an example for standard deviation
|
2012-09-23 12:55:31 +02:00
|
|
|
|
of 200 ms. If the heartbeats arrive with less deviation the curve becomes steeper,
|
2013-03-14 20:32:43 +01:00
|
|
|
|
i.e. it is possible to determine failure more quickly. The curve looks like this for
|
2012-09-21 16:23:55 +02:00
|
|
|
|
a standard deviation of 100 ms.
|
2012-09-20 15:24:07 +02:00
|
|
|
|
|
2013-04-19 13:21:15 +02:00
|
|
|
|
.. image:: ../images/phi2.png
|
2012-09-20 15:24:07 +02:00
|
|
|
|
|
|
|
|
|
|
To be able to survive sudden abnormalities, such as garbage collection pauses and
|
2012-09-23 12:55:31 +02:00
|
|
|
|
transient network failures the failure detector is configured with a margin,
|
|
|
|
|
|
``akka.cluster.failure-detector.acceptable-heartbeat-pause``. You may want to
|
2012-10-04 14:12:48 +02:00
|
|
|
|
adjust the :ref:`cluster_configuration_java` of this depending on you environment.
|
2012-09-20 15:24:07 +02:00
|
|
|
|
This is how the curve looks like for ``acceptable-heartbeat-pause`` configured to
|
|
|
|
|
|
3 seconds.
|
|
|
|
|
|
|
2013-04-19 13:21:15 +02:00
|
|
|
|
.. image:: ../images/phi3.png
|
2012-09-20 15:24:07 +02:00
|
|
|
|
|
2014-02-04 15:51:08 +01:00
|
|
|
|
Death watch uses the cluster failure detector for nodes in the cluster, i.e. it detects
|
|
|
|
|
|
network failures and JVM crashes, in addition to graceful termination of watched
|
|
|
|
|
|
actor. Death watch generates the ``Terminated`` message to the watching actor when the
|
|
|
|
|
|
unreachable cluster node has been downed and removed.
|
2013-01-31 09:30:05 +01:00
|
|
|
|
|
2013-07-01 17:17:12 +02:00
|
|
|
|
If you encounter suspicious false positives when the system is under load you should
|
|
|
|
|
|
define a separate dispatcher for the cluster actors as described in :ref:`cluster_dispatcher_java`.
|
|
|
|
|
|
|
2012-09-13 10:54:14 +02:00
|
|
|
|
Cluster Aware Routers
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
2012-10-04 14:12:48 +02:00
|
|
|
|
All :ref:`routers <routing-java>` can be made aware of member nodes in the cluster, i.e.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
deploying new routees or looking up routees on nodes in the cluster.
|
2013-08-27 15:14:53 +02:00
|
|
|
|
When a node becomes unreachable or leaves the cluster the routees of that node are
|
2012-09-13 10:54:14 +02:00
|
|
|
|
automatically unregistered from the router. When new nodes join the cluster additional
|
2013-08-27 15:14:53 +02:00
|
|
|
|
routees are added to the router, according to the configuration. Routees are also added
|
|
|
|
|
|
when a node becomes reachable again, after having been unreachable.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2015-09-04 12:38:49 +02:00
|
|
|
|
Cluster aware routers make use of members with status :ref:`WeaklyUp <weakly_up_java>` if that feature
|
|
|
|
|
|
is enabled.
|
|
|
|
|
|
|
2013-07-01 14:00:24 +02:00
|
|
|
|
There are two distinct types of routers.
|
|
|
|
|
|
|
2013-09-19 08:00:05 +02:00
|
|
|
|
* **Group - router that sends messages to the specified path using actor selection**
|
|
|
|
|
|
The routees can be shared between routers running on different nodes in the cluster.
|
|
|
|
|
|
One example of a use case for this type of router is a service running on some backend
|
|
|
|
|
|
nodes in the cluster and used by routers running on front-end nodes in the cluster.
|
2013-07-01 14:00:24 +02:00
|
|
|
|
|
2013-09-19 08:00:05 +02:00
|
|
|
|
* **Pool - router that creates routees as child actors and deploys them on remote nodes.**
|
2013-07-01 14:00:24 +02:00
|
|
|
|
Each router will have its own routee instances. For example, if you start a router
|
|
|
|
|
|
on 3 nodes in a 10 nodes cluster you will have 30 routee actors in total if the router is
|
2014-08-20 10:26:23 +02:00
|
|
|
|
configured to use one instance per node. The routees created by the different routers
|
2013-07-01 14:00:24 +02:00
|
|
|
|
will not be shared between the routers. One example of a use case for this type of router
|
|
|
|
|
|
is a single master that coordinate jobs and delegates the actual work to routees running
|
|
|
|
|
|
on other nodes in the cluster.
|
|
|
|
|
|
|
2013-09-19 08:00:05 +02:00
|
|
|
|
Router with Group of Routees
|
|
|
|
|
|
----------------------------
|
2013-07-01 14:00:24 +02:00
|
|
|
|
|
2013-09-19 08:00:05 +02:00
|
|
|
|
When using a ``Group`` you must start the routee actors on the cluster member nodes.
|
|
|
|
|
|
That is not done by the router. The configuration for a group looks like this:
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/multi-jvm/scala/sample/cluster/stats/StatsSampleSpec.scala#router-lookup-config
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2014-09-08 10:18:14 +02:00
|
|
|
|
.. note::
|
2013-07-01 14:00:24 +02:00
|
|
|
|
The routee actors should be started as early as possible when starting the actor system, because
|
2014-09-08 10:18:14 +02:00
|
|
|
|
the router will try to use them as soon as the member status is changed to 'Up'.
|
2013-07-01 14:00:24 +02:00
|
|
|
|
|
2015-07-01 14:49:06 +02:00
|
|
|
|
The actor paths without address information that are defined in ``routees.paths`` are used for selecting the
|
|
|
|
|
|
actors to which the messages will be forwarded to by the router.
|
2014-09-08 10:18:14 +02:00
|
|
|
|
Messages will be forwarded to the routees using :ref:`ActorSelection <actorSelection-java>`, so the same delivery semantics should be expected.
|
|
|
|
|
|
It is possible to limit the lookup of routees to member nodes tagged with a certain role by specifying ``use-role``.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2015-08-20 15:13:35 +02:00
|
|
|
|
``max-total-nr-of-instances`` defines total number of routees in the cluster. By default ``max-total-nr-of-instances``
|
|
|
|
|
|
is set to a high value (10000) that will result in new routees added to the router when nodes join the cluster.
|
|
|
|
|
|
Set it to a lower value if you want to limit total number of routees.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
|
|
|
|
|
The same type of router could also have been defined in code:
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/stats/Extra.java#router-lookup-in-code
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2012-10-04 14:12:48 +02:00
|
|
|
|
See :ref:`cluster_configuration_java` section for further descriptions of the settings.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2013-09-19 08:00:05 +02:00
|
|
|
|
Router Example with Group of Routees
|
|
|
|
|
|
------------------------------------
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2013-09-19 08:00:05 +02:00
|
|
|
|
Let's take a look at how to use a cluster aware router with a group of routees,
|
|
|
|
|
|
i.e. router sending to the paths of the routees.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
|
|
|
|
|
The example application provides a service to calculate statistics for a text.
|
|
|
|
|
|
When some text is sent to the service it splits it into words, and delegates the task
|
|
|
|
|
|
to count number of characters in each word to a separate worker, a routee of a router.
|
2012-09-23 12:55:31 +02:00
|
|
|
|
The character count for each word is sent back to an aggregator that calculates
|
2012-09-13 10:54:14 +02:00
|
|
|
|
the average number of characters per word when all results have been collected.
|
|
|
|
|
|
|
|
|
|
|
|
Messages:
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/stats/StatsMessages.java#messages
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
|
|
|
|
|
The worker that counts number of characters in each word:
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/stats/StatsWorker.java#worker
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
|
|
|
|
|
The service that receives text from users and splits it up into words, delegates to workers and aggregates:
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/stats/StatsService.java#service
|
2012-10-04 14:12:48 +02:00
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/stats/StatsAggregator.java#aggregator
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note, nothing cluster specific so far, just plain actors.
|
|
|
|
|
|
|
2013-07-01 14:00:24 +02:00
|
|
|
|
All nodes start ``StatsService`` and ``StatsWorker`` actors. Remember, routees are the workers in this case.
|
2013-10-16 11:06:38 +02:00
|
|
|
|
The router is configured with ``routees.paths``:
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/resources/stats1.conf#config-router-lookup
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2012-09-23 12:55:31 +02:00
|
|
|
|
This means that user requests can be sent to ``StatsService`` on any node and it will use
|
2013-11-29 16:27:23 +01:00
|
|
|
|
``StatsWorker`` on all nodes.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2014-03-13 08:54:41 +01:00
|
|
|
|
The `Typesafe Activator <http://www.typesafe.com/platform/getstarted>`_ tutorial named
|
|
|
|
|
|
`Akka Cluster Samples with Java <http://www.typesafe.com/activator/template/akka-sample-cluster-java>`_.
|
2013-11-29 16:27:23 +01:00
|
|
|
|
contains the full source code and instructions of how to run the **Router Example with Group of Routees**.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2013-09-19 08:00:05 +02:00
|
|
|
|
Router with Pool of Remote Deployed Routees
|
|
|
|
|
|
-------------------------------------------
|
2013-07-01 14:00:24 +02:00
|
|
|
|
|
2013-09-19 08:00:05 +02:00
|
|
|
|
When using a ``Pool`` with routees created and deployed on the cluster member nodes
|
2013-07-01 14:00:24 +02:00
|
|
|
|
the configuration for a router looks like this:
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/multi-jvm/scala/sample/cluster/stats/StatsSampleSingleMasterSpec.scala#router-deploy-config
|
2013-07-01 14:00:24 +02:00
|
|
|
|
|
|
|
|
|
|
It is possible to limit the deployment of routees to member nodes tagged with a certain role by
|
|
|
|
|
|
specifying ``use-role``.
|
|
|
|
|
|
|
2015-08-20 15:13:35 +02:00
|
|
|
|
``max-total-nr-of-instances`` defines total number of routees in the cluster, but the number of routees
|
|
|
|
|
|
per node, ``max-nr-of-instances-per-node``, will not be exceeded. By default ``max-total-nr-of-instances``
|
|
|
|
|
|
is set to a high value (10000) that will result in new routees added to the router when nodes join the cluster.
|
|
|
|
|
|
Set it to a lower value if you want to limit total number of routees.
|
2013-07-01 14:00:24 +02:00
|
|
|
|
|
|
|
|
|
|
The same type of router could also have been defined in code:
|
|
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/stats/Extra.java#router-deploy-in-code
|
2013-07-01 14:00:24 +02:00
|
|
|
|
|
|
|
|
|
|
See :ref:`cluster_configuration_java` section for further descriptions of the settings.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2013-09-19 08:00:05 +02:00
|
|
|
|
Router Example with Pool of Remote Deployed Routees
|
|
|
|
|
|
---------------------------------------------------
|
2013-01-14 14:09:53 +01:00
|
|
|
|
|
2013-07-01 14:00:24 +02:00
|
|
|
|
Let's take a look at how to use a cluster aware router on single master node that creates
|
2015-06-30 13:58:15 +02:00
|
|
|
|
and deploys workers. To keep track of a single master we use the :ref:`cluster-singleton-java`
|
2013-07-01 14:00:24 +02:00
|
|
|
|
in the contrib module. The ``ClusterSingletonManager`` is started on each node.
|
2013-01-14 14:09:53 +01:00
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/stats/StatsSampleOneMasterMain.java#create-singleton-manager
|
2013-01-14 14:09:53 +01:00
|
|
|
|
|
|
|
|
|
|
We also need an actor on each node that keeps track of where current single master exists and
|
2014-03-14 16:32:54 +01:00
|
|
|
|
delegates jobs to the ``StatsService``. That is provided by the ``ClusterSingletonProxy``.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2014-03-14 16:32:54 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/stats/StatsSampleOneMasterMain.java#singleton-proxy
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2014-03-14 16:32:54 +01:00
|
|
|
|
The ``ClusterSingletonProxy`` receives text from users and delegates to the current ``StatsService``, the single
|
2013-04-28 22:05:40 +02:00
|
|
|
|
master. It listens to cluster events to lookup the ``StatsService`` on the oldest node.
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2014-03-14 16:32:54 +01:00
|
|
|
|
All nodes start ``ClusterSingletonProxy`` and the ``ClusterSingletonManager``. The router is now configured like this:
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2013-11-29 16:27:23 +01:00
|
|
|
|
.. includecode:: ../../../akka-samples/akka-sample-cluster-java/src/main/resources/stats2.conf#config-router-deploy
|
2012-09-21 11:47:50 +02:00
|
|
|
|
|
2014-03-13 08:54:41 +01:00
|
|
|
|
The `Typesafe Activator <http://www.typesafe.com/platform/getstarted>`_ tutorial named
|
|
|
|
|
|
`Akka Cluster Samples with Java <http://www.typesafe.com/activator/template/akka-sample-cluster-java>`_.
|
2013-11-29 16:27:23 +01:00
|
|
|
|
contains the full source code and instructions of how to run the **Router Example with Pool of Remote Deployed Routees**.
|
2012-09-21 11:47:50 +02:00
|
|
|
|
|
2012-11-08 18:49:54 +01:00
|
|
|
|
Cluster Metrics
|
|
|
|
|
|
^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
2014-12-12 11:49:32 -06:00
|
|
|
|
The member nodes of the cluster can collect system health metrics and publish that to other cluster nodes
|
|
|
|
|
|
and to the registered subscribers on the system event bus with the help of :doc:`cluster-metrics`.
|
2012-11-08 18:49:54 +01:00
|
|
|
|
|
2012-09-21 11:47:50 +02:00
|
|
|
|
|
2012-10-04 14:12:48 +02:00
|
|
|
|
.. _cluster_jmx_java:
|
2012-09-19 16:46:39 +02:00
|
|
|
|
|
|
|
|
|
|
JMX
|
|
|
|
|
|
^^^
|
|
|
|
|
|
|
|
|
|
|
|
Information and management of the cluster is available as JMX MBeans with the root name ``akka.Cluster``.
|
|
|
|
|
|
The JMX information can be displayed with an ordinary JMX console such as JConsole or JVisualVM.
|
|
|
|
|
|
|
|
|
|
|
|
From JMX you can:
|
|
|
|
|
|
|
|
|
|
|
|
* see what members that are part of the cluster
|
|
|
|
|
|
* see status of this node
|
2014-11-22 11:58:53 +01:00
|
|
|
|
* see roles of each member
|
2012-09-19 16:46:39 +02:00
|
|
|
|
* join this node to another node in cluster
|
|
|
|
|
|
* mark any node in the cluster as down
|
|
|
|
|
|
* tell any node in the cluster to leave
|
|
|
|
|
|
|
2013-01-23 11:38:20 +01:00
|
|
|
|
Member nodes are identified by their address, in format `akka.<protocol>://<actor-system-name>@<hostname>:<port>`.
|
2012-09-19 16:46:39 +02:00
|
|
|
|
|
2012-10-04 14:12:48 +02:00
|
|
|
|
.. _cluster_command_line_java:
|
2012-09-19 16:46:39 +02:00
|
|
|
|
|
|
|
|
|
|
Command Line Management
|
|
|
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
|
2012-09-23 12:55:31 +02:00
|
|
|
|
The cluster can be managed with the script `bin/akka-cluster` provided in the
|
2012-09-19 16:46:39 +02:00
|
|
|
|
Akka distribution.
|
|
|
|
|
|
|
|
|
|
|
|
Run it without parameters to see instructions about how to use the script::
|
|
|
|
|
|
|
2013-08-16 15:42:16 +02:00
|
|
|
|
Usage: bin/akka-cluster <node-hostname> <jmx-port> <command> ...
|
2012-09-19 16:46:39 +02:00
|
|
|
|
|
|
|
|
|
|
Supported commands are:
|
|
|
|
|
|
join <node-url> - Sends request a JOIN node with the specified URL
|
|
|
|
|
|
leave <node-url> - Sends a request for node with URL to LEAVE the cluster
|
|
|
|
|
|
down <node-url> - Sends a request for marking node with URL as DOWN
|
|
|
|
|
|
member-status - Asks the member node for its current status
|
2012-12-06 10:47:24 +01:00
|
|
|
|
members - Asks the cluster for addresses of current members
|
|
|
|
|
|
unreachable - Asks the cluster for addresses of unreachable members
|
2012-09-23 12:55:31 +02:00
|
|
|
|
cluster-status - Asks the cluster for its current status (member ring,
|
2012-09-19 16:46:39 +02:00
|
|
|
|
unavailable nodes, meta data etc.)
|
|
|
|
|
|
leader - Asks the cluster who the current leader is
|
2012-09-23 12:55:31 +02:00
|
|
|
|
is-singleton - Checks if the cluster is a singleton cluster (single
|
2012-09-19 16:46:39 +02:00
|
|
|
|
node cluster)
|
|
|
|
|
|
is-available - Checks if the member node is available
|
2012-11-20 17:18:37 +01:00
|
|
|
|
Where the <node-url> should be on the format of
|
2013-01-23 11:38:20 +01:00
|
|
|
|
'akka.<protocol>://<actor-system-name>@<hostname>:<port>'
|
2012-09-19 16:46:39 +02:00
|
|
|
|
|
2013-08-16 15:42:16 +02:00
|
|
|
|
Examples: bin/akka-cluster localhost 9999 is-available
|
|
|
|
|
|
bin/akka-cluster localhost 9999 join akka.tcp://MySystem@darkstar:2552
|
|
|
|
|
|
bin/akka-cluster localhost 9999 cluster-status
|
2012-09-19 16:46:39 +02:00
|
|
|
|
|
|
|
|
|
|
|
2012-09-23 12:55:31 +02:00
|
|
|
|
To be able to use the script you must enable remote monitoring and management when starting the JVMs of the cluster nodes,
|
2012-09-19 16:46:39 +02:00
|
|
|
|
as described in `Monitoring and Management Using JMX Technology <http://docs.oracle.com/javase/6/docs/technotes/guides/management/agent.html>`_
|
|
|
|
|
|
|
|
|
|
|
|
Example of system properties to enable remote monitoring and management::
|
|
|
|
|
|
|
|
|
|
|
|
java -Dcom.sun.management.jmxremote.port=9999 \
|
|
|
|
|
|
-Dcom.sun.management.jmxremote.authenticate=false \
|
|
|
|
|
|
-Dcom.sun.management.jmxremote.ssl=false
|
|
|
|
|
|
|
2012-10-04 14:12:48 +02:00
|
|
|
|
.. _cluster_configuration_java:
|
2012-09-13 10:54:14 +02:00
|
|
|
|
|
2012-08-27 07:47:34 +02:00
|
|
|
|
Configuration
|
|
|
|
|
|
^^^^^^^^^^^^^
|
|
|
|
|
|
|
2014-03-23 18:39:55 +01:00
|
|
|
|
There are several configuration properties for the cluster. We refer to the
|
|
|
|
|
|
:ref:`reference configuration <config-akka-cluster>` for more information.
|
2012-08-27 07:47:34 +02:00
|
|
|
|
|
2013-05-23 13:36:35 +02:00
|
|
|
|
Cluster Info Logging
|
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
|
|
You can silence the logging of cluster events at info level with configuration property::
|
|
|
|
|
|
|
|
|
|
|
|
akka.cluster.log-info = off
|
2013-07-01 17:17:12 +02:00
|
|
|
|
|
|
|
|
|
|
.. _cluster_dispatcher_java:
|
|
|
|
|
|
|
|
|
|
|
|
Cluster Dispatcher
|
|
|
|
|
|
------------------
|
|
|
|
|
|
|
|
|
|
|
|
Under the hood the cluster extension is implemented with actors and it can be necessary
|
|
|
|
|
|
to create a bulkhead for those actors to avoid disturbance from other actors. Especially
|
|
|
|
|
|
the heartbeating actors that is used for failure detection can generate false positives
|
|
|
|
|
|
if they are not given a chance to run at regular intervals.
|
|
|
|
|
|
For this purpose you can define a separate dispatcher to be used for the cluster actors::
|
|
|
|
|
|
|
|
|
|
|
|
akka.cluster.use-dispatcher = cluster-dispatcher
|
|
|
|
|
|
|
|
|
|
|
|
cluster-dispatcher {
|
|
|
|
|
|
type = "Dispatcher"
|
|
|
|
|
|
executor = "fork-join-executor"
|
|
|
|
|
|
fork-join-executor {
|
|
|
|
|
|
parallelism-min = 2
|
|
|
|
|
|
parallelism-max = 4
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|