This commit is contained in:
parent
b2b948ff0e
commit
edad69b38c
25 changed files with 1095 additions and 994 deletions
|
|
@ -33,3 +33,6 @@ RedirectMatch 301 ^(.*)/howto\.html$ https://doc.akka.io/docs/akka/2.5/howto.htm
|
||||||
RedirectMatch 301 ^(.*)/common/duration\.html$ https://doc.akka.io/docs/akka/2.5/common/duration.html
|
RedirectMatch 301 ^(.*)/common/duration\.html$ https://doc.akka.io/docs/akka/2.5/common/duration.html
|
||||||
RedirectMatch 301 ^(.*)/futures\.html$ https://doc.akka.io/docs/akka/2.5/futures.html
|
RedirectMatch 301 ^(.*)/futures\.html$ https://doc.akka.io/docs/akka/2.5/futures.html
|
||||||
RedirectMatch 301 ^(.*)/java8-compat\.html$ https://doc.akka.io/docs/akka/2.5/java8-compat.html
|
RedirectMatch 301 ^(.*)/java8-compat\.html$ https://doc.akka.io/docs/akka/2.5/java8-compat.html
|
||||||
|
|
||||||
|
RedirectMatch 301 ^(.*)/common/cluster\.html$ $1/typed/cluster-specification.html
|
||||||
|
RedirectMatch 301 ^(.*)/additional/observability\.html$ $1/additional/operations.html
|
||||||
|
|
@ -5,11 +5,8 @@
|
||||||
@@@ index
|
@@@ index
|
||||||
|
|
||||||
* [Packaging](packaging.md)
|
* [Packaging](packaging.md)
|
||||||
|
* [Operating, Managing, Observability](operations.md)
|
||||||
* [Deploying](deploying.md)
|
* [Deploying](deploying.md)
|
||||||
* [Rolling Updates](rolling-updates.md)
|
* [Rolling Updates](rolling-updates.md)
|
||||||
* [Observability](observability.md)
|
|
||||||
|
|
||||||
@@@
|
|
||||||
|
|
||||||
Information on running and managing Akka applications can be found in
|
@@@
|
||||||
the [Akka Management](https://doc.akka.io/docs/akka-management/current/) project documentation.
|
|
||||||
|
|
|
||||||
|
|
@ -1,3 +0,0 @@
|
||||||
# Monitoring
|
|
||||||
|
|
||||||
Aside from log monitoring and the monitoring provided by your APM or platform provider, [Lightbend Telemetry](https://developer.lightbend.com/docs/telemetry/current/instrumentations/akka/akka.html), available through a [Lightbend Platform Subscription](https://www.lightbend.com/lightbend-platform-subscription), can provide additional insights in the run-time characteristics of your application, including metrics, events, and distributed tracing for Akka Actors, Cluster, HTTP, and more.
|
|
||||||
62
akka-docs/src/main/paradox/additional/operations.md
Normal file
62
akka-docs/src/main/paradox/additional/operations.md
Normal file
|
|
@ -0,0 +1,62 @@
|
||||||
|
# Operating a Cluster
|
||||||
|
|
||||||
|
This documentation discusses how to operate a cluster. For related, more specific guides
|
||||||
|
see [Packaging](packaging.md), [Deploying](deploying.md) and [Rolling Updates](rolling-updates.md).
|
||||||
|
|
||||||
|
## Starting
|
||||||
|
|
||||||
|
### Cluster Bootstrap
|
||||||
|
|
||||||
|
When starting clusters on cloud systems such as Kubernetes, AWS, Google Cloud, Azure, Mesos or others,
|
||||||
|
you may want to automate the discovery of nodes for the cluster joining process, using your cloud providers,
|
||||||
|
cluster orchestrator, or other form of service discovery (such as managed DNS).
|
||||||
|
|
||||||
|
The open source Akka Management library includes the [Cluster Bootstrap](https://doc.akka.io/docs/akka-management/current/bootstrap/index.html)
|
||||||
|
module which handles just that. Please refer to its documentation for more details.
|
||||||
|
|
||||||
|
@@@ note
|
||||||
|
|
||||||
|
If you are running Akka in a Docker container or the nodes for some other reason have separate internal and
|
||||||
|
external ip addresses you must configure remoting according to @ref:[Akka behind NAT or in a Docker container](../remoting-artery.md#remote-configuration-nat-artery)
|
||||||
|
|
||||||
|
@@@
|
||||||
|
|
||||||
|
## Stopping
|
||||||
|
|
||||||
|
See @ref:[Rolling Updates, Cluster Shutdown and Coordinated Shutdown](../additional/rolling-updates.md#cluster-shutdown).
|
||||||
|
|
||||||
|
## Cluster Management
|
||||||
|
|
||||||
|
There are several management tools for the cluster.
|
||||||
|
Complete information on running and managing Akka applications can be found in
|
||||||
|
the [Akka Management](https://doc.akka.io/docs/akka-management/current/) project documentation.
|
||||||
|
|
||||||
|
<a id="cluster-http"></a>
|
||||||
|
### HTTP
|
||||||
|
|
||||||
|
Information and management of the cluster is available with a HTTP API.
|
||||||
|
See documentation of [Akka Management](http://developer.lightbend.com/docs/akka-management/current/).
|
||||||
|
|
||||||
|
<a id="cluster-jmx"></a>
|
||||||
|
### JMX
|
||||||
|
|
||||||
|
Information and management of the cluster is available as JMX MBeans with the root name `akka.Cluster`.
|
||||||
|
The JMX information can be displayed with an ordinary JMX console such as JConsole or JVisualVM.
|
||||||
|
|
||||||
|
From JMX you can:
|
||||||
|
|
||||||
|
* see what members that are part of the cluster
|
||||||
|
* see status of this node
|
||||||
|
* see roles of each member
|
||||||
|
* join this node to another node in cluster
|
||||||
|
* mark any node in the cluster as down
|
||||||
|
* tell any node in the cluster to leave
|
||||||
|
|
||||||
|
Member nodes are identified by their address, in format *`akka://actor-system-name@hostname:port`*.
|
||||||
|
|
||||||
|
## Monitoring and Observability
|
||||||
|
|
||||||
|
Aside from log monitoring and the monitoring provided by your APM or platform provider, [Lightbend Telemetry](https://developer.lightbend.com/docs/telemetry/current/instrumentations/akka/akka.html),
|
||||||
|
available through a [Lightbend Platform Subscription](https://www.lightbend.com/lightbend-platform-subscription),
|
||||||
|
can provide additional insights in the run-time characteristics of your application, including metrics, events,
|
||||||
|
and distributed tracing for Akka Actors, Cluster, HTTP, and more.
|
||||||
|
|
@ -86,7 +86,7 @@ in an application composed of multiple JARs to reside under a single package nam
|
||||||
might scan all classes from `com.example.plugins` for specific service implementations with that package existing in
|
might scan all classes from `com.example.plugins` for specific service implementations with that package existing in
|
||||||
several contributed JARs.
|
several contributed JARs.
|
||||||
While it is possible to support overlapping packages with complex manifest headers, it's much better to use non-overlapping
|
While it is possible to support overlapping packages with complex manifest headers, it's much better to use non-overlapping
|
||||||
package spaces and facilities such as @ref:[Akka Cluster](../common/cluster.md)
|
package spaces and facilities such as @ref:[Akka Cluster](../typed/cluster-specification.md)
|
||||||
for service discovery. Stylistically, many organizations opt to use the root package path as the name of the bundle
|
for service discovery. Stylistically, many organizations opt to use the root package path as the name of the bundle
|
||||||
distribution file.
|
distribution file.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -63,7 +63,7 @@ Environments such as Kubernetes send a SIGTERM, however if the JVM is wrapped wi
|
||||||
### Ungraceful shutdown
|
### Ungraceful shutdown
|
||||||
|
|
||||||
In case of network failures it may still be necessary to set the node's status to Down in order to complete the removal.
|
In case of network failures it may still be necessary to set the node's status to Down in order to complete the removal.
|
||||||
@ref:[Cluster Downing](../cluster-usage.md#downing) details downing nodes and downing providers.
|
@ref:[Cluster Downing](../typed/cluster.md#downing) details downing nodes and downing providers.
|
||||||
[Split Brain Resolver](https://doc.akka.io/docs/akka-enhancements/current/split-brain-resolver.html) can be used to ensure
|
[Split Brain Resolver](https://doc.akka.io/docs/akka-enhancements/current/split-brain-resolver.html) can be used to ensure
|
||||||
the cluster continues to function during network partitions and node failures. For example
|
the cluster continues to function during network partitions and node failures. For example
|
||||||
if there is an unreachability problem Split Brain Resolver would make a decision based on the configured downing strategy.
|
if there is an unreachability problem Split Brain Resolver would make a decision based on the configured downing strategy.
|
||||||
|
|
@ -79,7 +79,7 @@ and ensure all nodes are in this state
|
||||||
* Deploy again and with the new nodes set to `akka.cluster.configuration-compatibility-check.enforce-on-join = on`.
|
* Deploy again and with the new nodes set to `akka.cluster.configuration-compatibility-check.enforce-on-join = on`.
|
||||||
|
|
||||||
Full documentation about enforcing these checks on joining nodes and optionally adding custom checks can be found in
|
Full documentation about enforcing these checks on joining nodes and optionally adding custom checks can be found in
|
||||||
@ref:[Akka Cluster configuration compatibility checks](../cluster-usage.md#configuration-compatibility-check).
|
@ref:[Akka Cluster configuration compatibility checks](../typed/cluster.md#configuration-compatibility-check).
|
||||||
|
|
||||||
## Rolling Updates and Migrating Akka
|
## Rolling Updates and Migrating Akka
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -76,8 +76,8 @@ up a large cluster into smaller groups of nodes for better scalability.
|
||||||
|
|
||||||
## Membership
|
## Membership
|
||||||
|
|
||||||
Some @ref[membership transitions](common/cluster.md#membership-lifecycle) are managed by
|
Some @ref[membership transitions](typed/cluster-membership.md#membership-lifecycle) are managed by
|
||||||
one node called the @ref[leader](common/cluster.md#leader). There is one leader per data center
|
one node called the @ref[leader](typed/cluster-specification.md#leader). There is one leader per data center
|
||||||
and it is responsible for these transitions for the members within the same data center. Members of
|
and it is responsible for these transitions for the members within the same data center. Members of
|
||||||
other data centers are managed independently by the leader of the respective data center. These actions
|
other data centers are managed independently by the leader of the respective data center. These actions
|
||||||
cannot be performed while there are any unreachability observations among the nodes in the data center,
|
cannot be performed while there are any unreachability observations among the nodes in the data center,
|
||||||
|
|
@ -91,7 +91,7 @@ User actions like joining, leaving, and downing can be sent to any node in the c
|
||||||
not only to the nodes in the data center of the node. Seed nodes are also global.
|
not only to the nodes in the data center of the node. Seed nodes are also global.
|
||||||
|
|
||||||
The data center membership is implemented by adding the data center name prefixed with `"dc-"` to the
|
The data center membership is implemented by adding the data center name prefixed with `"dc-"` to the
|
||||||
@ref[roles](cluster-usage.md#node-roles) of the member and thereby this information is known
|
@ref[roles](typed/cluster.md#node-roles) of the member and thereby this information is known
|
||||||
by all other members in the cluster. This is an implementation detail, but it can be good to know
|
by all other members in the cluster. This is an implementation detail, but it can be good to know
|
||||||
if you see this in log messages.
|
if you see this in log messages.
|
||||||
|
|
||||||
|
|
@ -105,7 +105,7 @@ Java
|
||||||
|
|
||||||
## Failure Detection
|
## Failure Detection
|
||||||
|
|
||||||
@ref[Failure detection](cluster-usage.md#failure-detector) is performed by sending heartbeat messages
|
@ref[Failure detection](typed/cluster-specification.md#failure-detector) is performed by sending heartbeat messages
|
||||||
to detect if a node is unreachable. This is done more frequently and with more certainty among
|
to detect if a node is unreachable. This is done more frequently and with more certainty among
|
||||||
the nodes in the same data center than across data centers. The failure detection across different data centers should
|
the nodes in the same data center than across data centers. The failure detection across different data centers should
|
||||||
be interpreted as an indication of problem with the network link between the data centers.
|
be interpreted as an indication of problem with the network link between the data centers.
|
||||||
|
|
@ -132,6 +132,8 @@ This influences how rolling upgrades should be performed. Don't stop all of the
|
||||||
at the same time. Stop one or a few at a time so that new nodes can take over the responsibility.
|
at the same time. Stop one or a few at a time so that new nodes can take over the responsibility.
|
||||||
It's best to leave the oldest nodes until last.
|
It's best to leave the oldest nodes until last.
|
||||||
|
|
||||||
|
See the @ref:[failure detector](typed/cluster.md#failure-detector) for more details.
|
||||||
|
|
||||||
## Cluster Singleton
|
## Cluster Singleton
|
||||||
|
|
||||||
The @ref[Cluster Singleton](cluster-singleton.md) is a singleton per data center. If you start the
|
The @ref[Cluster Singleton](cluster-singleton.md) is a singleton per data center. If you start the
|
||||||
|
|
|
||||||
|
|
@ -27,7 +27,7 @@ Cluster metrics information is primarily used for load-balancing routers,
|
||||||
and can also be used to implement advanced metrics-based node life cycles,
|
and can also be used to implement advanced metrics-based node life cycles,
|
||||||
such as "Node Let-it-crash" when CPU steal time becomes excessive.
|
such as "Node Let-it-crash" when CPU steal time becomes excessive.
|
||||||
|
|
||||||
Cluster members with status @ref:[WeaklyUp](cluster-usage.md#weakly-up), if that feature is enabled,
|
Cluster members with status @ref:[WeaklyUp](typed/cluster-membership.md#weaklyup-members), if that feature is enabled,
|
||||||
will participate in Cluster Metrics collection and dissemination.
|
will participate in Cluster Metrics collection and dissemination.
|
||||||
|
|
||||||
## Metrics Collector
|
## Metrics Collector
|
||||||
|
|
|
||||||
|
|
@ -41,7 +41,7 @@ the sender to know the location of the destination actor. This is achieved by se
|
||||||
the messages via a `ShardRegion` actor provided by this extension, which knows how
|
the messages via a `ShardRegion` actor provided by this extension, which knows how
|
||||||
to route the message with the entity id to the final destination.
|
to route the message with the entity id to the final destination.
|
||||||
|
|
||||||
Cluster sharding will not be active on members with status @ref:[WeaklyUp](cluster-usage.md#weakly-up)
|
Cluster sharding will not be active on members with status @ref:[WeaklyUp](typed/cluster-membership.md#weaklyup-members)
|
||||||
if that feature is enabled.
|
if that feature is enabled.
|
||||||
|
|
||||||
@@@ warning
|
@@@ warning
|
||||||
|
|
@ -49,7 +49,7 @@ if that feature is enabled.
|
||||||
**Don't use Cluster Sharding together with Automatic Downing**,
|
**Don't use Cluster Sharding together with Automatic Downing**,
|
||||||
since it allows the cluster to split up into two separate clusters, which in turn will result
|
since it allows the cluster to split up into two separate clusters, which in turn will result
|
||||||
in *multiple shards and entities* being started, one in each separate cluster!
|
in *multiple shards and entities* being started, one in each separate cluster!
|
||||||
See @ref:[Downing](cluster-usage.md#automatic-vs-manual-downing).
|
See @ref:[Downing](typed/cluster.md#automatic-vs-manual-downing).
|
||||||
|
|
||||||
@@@
|
@@@
|
||||||
|
|
||||||
|
|
@ -492,7 +492,7 @@ and there was a network partition.
|
||||||
**Don't use Cluster Sharding together with Automatic Downing**,
|
**Don't use Cluster Sharding together with Automatic Downing**,
|
||||||
since it allows the cluster to split up into two separate clusters, which in turn will result
|
since it allows the cluster to split up into two separate clusters, which in turn will result
|
||||||
in *multiple shards and entities* being started, one in each separate cluster!
|
in *multiple shards and entities* being started, one in each separate cluster!
|
||||||
See @ref:[Downing](cluster-usage.md#automatic-vs-manual-downing).
|
See @ref:[Downing](typed/cluster.md#automatic-vs-manual-downing).
|
||||||
|
|
||||||
@@@
|
@@@
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -44,7 +44,7 @@ The oldest member is determined by `akka.cluster.Member#isOlderThan`.
|
||||||
This can change when removing that member from the cluster. Be aware that there is a short time
|
This can change when removing that member from the cluster. Be aware that there is a short time
|
||||||
period when there is no active singleton during the hand-over process.
|
period when there is no active singleton during the hand-over process.
|
||||||
|
|
||||||
The cluster failure detector will notice when oldest node becomes unreachable due to
|
The cluster @ref:[failure detector](typed/cluster.md#failure-detector) will notice when oldest node becomes unreachable due to
|
||||||
things like JVM crash, hard shut down, or network failure. Then a new oldest node will
|
things like JVM crash, hard shut down, or network failure. Then a new oldest node will
|
||||||
take over and a new singleton actor is created. For these failure scenarios there will
|
take over and a new singleton actor is created. For these failure scenarios there will
|
||||||
not be a graceful hand-over, but more than one active singletons is prevented by all
|
not be a graceful hand-over, but more than one active singletons is prevented by all
|
||||||
|
|
@ -65,7 +65,7 @@ It's worth noting that messages can always be lost because of the distributed na
|
||||||
As always, additional logic should be implemented in the singleton (acknowledgement) and in the
|
As always, additional logic should be implemented in the singleton (acknowledgement) and in the
|
||||||
client (retry) actors to ensure at-least-once message delivery.
|
client (retry) actors to ensure at-least-once message delivery.
|
||||||
|
|
||||||
The singleton instance will not run on members with status @ref:[WeaklyUp](cluster-usage.md#weakly-up).
|
The singleton instance will not run on members with status @ref:[WeaklyUp](typed/cluster-membership.md#weaklyup-members).
|
||||||
|
|
||||||
## Potential problems to be aware of
|
## Potential problems to be aware of
|
||||||
|
|
||||||
|
|
@ -75,7 +75,7 @@ This pattern may seem to be very tempting to use at first, but it has several dr
|
||||||
* you can not rely on the cluster singleton to be *non-stop* available — e.g. when the node on which the singleton has
|
* you can not rely on the cluster singleton to be *non-stop* available — e.g. when the node on which the singleton has
|
||||||
been running dies, it will take a few seconds for this to be noticed and the singleton be migrated to another node,
|
been running dies, it will take a few seconds for this to be noticed and the singleton be migrated to another node,
|
||||||
* in the case of a *network partition* appearing in a Cluster that is using Automatic Downing (see docs for
|
* in the case of a *network partition* appearing in a Cluster that is using Automatic Downing (see docs for
|
||||||
@ref:[Auto Downing](cluster-usage.md#automatic-vs-manual-downing)),
|
@ref:[Auto Downing](typed/cluster.md#automatic-vs-manual-downing)),
|
||||||
it may happen that the isolated clusters each decide to spin up their own singleton, meaning that there might be multiple
|
it may happen that the isolated clusters each decide to spin up their own singleton, meaning that there might be multiple
|
||||||
singletons running in the system, yet the Clusters have no way of finding out about them (because of the partition).
|
singletons running in the system, yet the Clusters have no way of finding out about them (because of the partition).
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,18 +1,22 @@
|
||||||
# Cluster Usage
|
# Classic Cluster Usage
|
||||||
|
|
||||||
|
This document describes how to use Akka Cluster and the Cluster APIs using code samples.
|
||||||
|
For specific documentation topics see:
|
||||||
|
|
||||||
For introduction to the Akka Cluster concepts please see @ref:[Cluster Specification](common/cluster.md).
|
* @ref:[Cluster Specification](typed/cluster-specification.md)
|
||||||
|
* @ref:[Cluster Membership Service](typed/cluster-membership.md)
|
||||||
The core of Akka Cluster is the cluster membership, to keep track of what nodes are part of the cluster and
|
* @ref:[When and where to use Akka Cluster](typed/choosing-cluster.md)
|
||||||
their health. There are several @ref:[Higher level Cluster tools](cluster-usage.md#higher-level-cluster-tools) that are built
|
* @ref:[Higher level Cluster tools](#higher-level-cluster-tools)
|
||||||
on top of the cluster membership.
|
* @ref:[Rolling Updates](additional/rolling-updates.md)
|
||||||
|
* @ref:[Operating, Managing, Observability](additional/operations.md)
|
||||||
You need to enable @ref:[serialization](serialization.md) for your actor messages.
|
|
||||||
@ref:[Serialization with Jackson](serialization-jackson.md) is a good choice in many cases and our
|
Enable @ref:[serialization](serialization.md) to send events between ActorSystems and systems
|
||||||
recommendation if you don't have other preference.
|
external to the Cluster. @ref:[Serialization with Jackson](serialization-jackson.md) is a good choice in many cases, and our
|
||||||
|
recommendation if you don't have other preferences or constraints.
|
||||||
|
|
||||||
## Dependency
|
## Dependency
|
||||||
|
|
||||||
To use Akka Cluster, you must add the following dependency in your project:
|
To use Akka Cluster add the following dependency in your project:
|
||||||
|
|
||||||
@@dependency[sbt,Maven,Gradle] {
|
@@dependency[sbt,Maven,Gradle] {
|
||||||
group="com.typesafe.akka"
|
group="com.typesafe.akka"
|
||||||
|
|
@ -20,135 +24,28 @@ To use Akka Cluster, you must add the following dependency in your project:
|
||||||
version="$akka.version$"
|
version="$akka.version$"
|
||||||
}
|
}
|
||||||
|
|
||||||
## Sample project
|
## Cluster samples
|
||||||
|
|
||||||
You can look at the
|
To see what using Akka Cluster looks like in practice, see the
|
||||||
@java[@extref[Cluster example project](samples:akka-samples-cluster-java)]
|
@java[@extref[Cluster example project](samples:akka-samples-cluster-java)]
|
||||||
@scala[@extref[Cluster example project](samples:akka-samples-cluster-scala)]
|
@scala[@extref[Cluster example project](samples:akka-samples-cluster-scala)].
|
||||||
to see what this looks like in practice.
|
This project contains samples illustrating different features, such as
|
||||||
|
subscribing to cluster membership events, sending messages to actors running on nodes in the cluster,
|
||||||
|
and using Cluster aware routers.
|
||||||
|
|
||||||
|
The easiest way to run this example yourself is to try the
|
||||||
|
@scala[@extref[Akka Cluster Sample with Scala](samples:akka-samples-cluster-scala)]@java[@extref[Akka Cluster Sample with Java](samples:akka-samples-cluster-java)].
|
||||||
|
It contains instructions on how to run the `SimpleClusterApp`.
|
||||||
|
|
||||||
## When and where to use Akka Cluster
|
## When and where to use Akka Cluster
|
||||||
|
|
||||||
|
See [Choosing Akka Cluster](typed/choosing-cluster.md#when-and-where-to-use-akka-cluster).
|
||||||
|
|
||||||
An architectural choice you have to make is if you are going to use a microservices architecture or
|
## Cluster API Extension
|
||||||
a traditional distributed application. This choice will influence how you should use Akka Cluster.
|
|
||||||
|
|
||||||
### Microservices
|
|
||||||
|
|
||||||
Microservices has many attractive properties, such as the independent nature of microservices allows for
|
|
||||||
multiple smaller and more focused teams that can deliver new functionality more frequently and can
|
|
||||||
respond quicker to business opportunities. Reactive Microservices should be isolated, autonomous, and have
|
|
||||||
a single responsibility as identified by Jonas Bonér in the book
|
|
||||||
[Reactive Microsystems: The Evolution of Microservices at Scale](https://info.lightbend.com/ebook-reactive-microservices-the-evolution-of-microservices-at-scale-register.html).
|
|
||||||
|
|
||||||
In a microservices architecture, you should consider communication within a service and between services.
|
|
||||||
|
|
||||||
In general we recommend against using Akka Cluster and actor messaging between _different_ services because that
|
|
||||||
would result in a too tight code coupling between the services and difficulties deploying these independent of
|
|
||||||
each other, which is one of the main reasons for using a microservices architecture.
|
|
||||||
See the discussion on
|
|
||||||
@scala[[Internal and External Communication](https://www.lagomframework.com/documentation/current/scala/InternalAndExternalCommunication.html)]
|
|
||||||
@java[[Internal and External Communication](https://www.lagomframework.com/documentation/current/java/InternalAndExternalCommunication.html)]
|
|
||||||
in the docs of the [Lagom Framework](https://www.lagomframework.com) (where each microservice is an Akka Cluster)
|
|
||||||
for some background on this.
|
|
||||||
|
|
||||||
Nodes of a single service (collectively called a cluster) require less decoupling. They share the same code and
|
|
||||||
are deployed together, as a set, by a single team or individual. There might be two versions running concurrently
|
|
||||||
during a rolling deployment, but deployment of the entire set has a single point of control. For this reason,
|
|
||||||
intra-service communication can take advantage of Akka Cluster, failure management and actor messaging, which
|
|
||||||
is convenient to use and has great performance.
|
|
||||||
|
|
||||||
Between different services [Akka HTTP](https://doc.akka.io/docs/akka-http/current) or
|
|
||||||
[Akka gRPC](https://doc.akka.io/docs/akka-grpc/current/) can be used for synchronous (yet non-blocking)
|
|
||||||
communication and [Akka Streams Kafka](https://doc.akka.io/docs/akka-stream-kafka/current/home.html) or other
|
|
||||||
[Alpakka](https://doc.akka.io/docs/alpakka/current/) connectors for integration asynchronous communication.
|
|
||||||
All those communication mechanisms work well with streaming of messages with end-to-end back-pressure, and the
|
|
||||||
synchronous communication tools can also be used for single request response interactions. It is also important
|
|
||||||
to note that when using these tools both sides of the communication do not have to be implemented with Akka,
|
|
||||||
nor does the programming language matter.
|
|
||||||
|
|
||||||
### Traditional distributed application
|
|
||||||
|
|
||||||
We acknowledge that microservices also introduce many new challenges and it's not the only way to
|
|
||||||
build applications. A traditional distributed application may have less complexity and work well in many cases.
|
|
||||||
For example for a small startup, with a single team, building an application where time to market is everything.
|
|
||||||
Akka Cluster can efficiently be used for building such distributed application.
|
|
||||||
|
|
||||||
In this case, you have a single deployment unit, built from a single code base (or using traditional binary
|
|
||||||
dependency management to modularize) but deployed across many nodes using a single cluster.
|
|
||||||
Tighter coupling is OK, because there is a central point of deployment and control. In some cases, nodes may
|
|
||||||
have specialized runtime roles which means that the cluster is not totally homogenous (e.g., "front-end" and
|
|
||||||
"back-end" nodes, or dedicated master/worker nodes) but if these are run from the same built artifacts this
|
|
||||||
is just a runtime behavior and doesn't cause the same kind of problems you might get from tight coupling of
|
|
||||||
totally separate artifacts.
|
|
||||||
|
|
||||||
A tightly coupled distributed application has served the industry and many Akka users well for years and is
|
|
||||||
still a valid choice.
|
|
||||||
|
|
||||||
### Distributed monolith
|
|
||||||
|
|
||||||
There is also an anti-pattern that is sometimes called "distributed monolith". You have multiple services
|
|
||||||
that are built and deployed independently from each other, but they have a tight coupling that makes this
|
|
||||||
very risky, such as a shared cluster, shared code and dependencies for service API calls, or a shared
|
|
||||||
database schema. There is a false sense of autonomy because of the physical separation of the code and
|
|
||||||
deployment units, but you are likely to encounter problems because of changes in the implementation of
|
|
||||||
one service leaking into the behavior of others. See Ben Christensen's
|
|
||||||
[Don’t Build a Distributed Monolith](https://www.microservices.com/talks/dont-build-a-distributed-monolith/).
|
|
||||||
|
|
||||||
Organizations that find themselves in this situation often react by trying to centrally coordinate deployment
|
|
||||||
of multiple services, at which point you have lost the principal benefit of microservices while taking on
|
|
||||||
the costs. You are in a halfway state with things that aren't really separable being built and deployed
|
|
||||||
in a separate way. Some people do this, and some manage to make it work, but it's not something we would
|
|
||||||
recommend and it needs to be carefully managed.
|
|
||||||
|
|
||||||
## A Simple Cluster Example
|
|
||||||
|
|
||||||
The following configuration enables the `Cluster` extension to be used.
|
The following configuration enables the `Cluster` extension to be used.
|
||||||
It joins the cluster and an actor subscribes to cluster membership events and logs them.
|
It joins the cluster and an actor subscribes to cluster membership events and logs them.
|
||||||
|
|
||||||
The `application.conf` configuration looks like this:
|
|
||||||
|
|
||||||
```
|
|
||||||
akka {
|
|
||||||
actor {
|
|
||||||
provider = "cluster"
|
|
||||||
}
|
|
||||||
remote.artery {
|
|
||||||
canonical {
|
|
||||||
hostname = "127.0.0.1"
|
|
||||||
port = 2551
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
cluster {
|
|
||||||
seed-nodes = [
|
|
||||||
"akka://ClusterSystem@127.0.0.1:2551",
|
|
||||||
"akka://ClusterSystem@127.0.0.1:2552"]
|
|
||||||
|
|
||||||
# auto downing is NOT safe for production deployments.
|
|
||||||
# you may want to use it during development, read more about it in the docs.
|
|
||||||
#
|
|
||||||
# auto-down-unreachable-after = 10s
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
To enable cluster capabilities in your Akka project you should, at a minimum, add the @ref:[Remoting](remoting-artery.md)
|
|
||||||
settings, but with `cluster`.
|
|
||||||
The `akka.cluster.seed-nodes` should normally also be added to your `application.conf` file.
|
|
||||||
|
|
||||||
@@@ note
|
|
||||||
|
|
||||||
If you are running Akka in a Docker container or the nodes for some other reason have separate internal and
|
|
||||||
external ip addresses you must configure remoting according to @ref:[Akka behind NAT or in a Docker container](remoting-artery.md#remote-configuration-nat-artery)
|
|
||||||
|
|
||||||
@@@
|
|
||||||
|
|
||||||
The seed nodes are configured contact points for initial, automatic, join of the cluster.
|
|
||||||
|
|
||||||
Note that if you are going to start the nodes on different machines you need to specify the
|
|
||||||
ip-addresses or host names of the machines in `application.conf` instead of `127.0.0.1`
|
|
||||||
|
|
||||||
An actor that uses the cluster extension may look like this:
|
An actor that uses the cluster extension may look like this:
|
||||||
|
|
||||||
Scala
|
Scala
|
||||||
|
|
@ -157,80 +54,44 @@ Scala
|
||||||
Java
|
Java
|
||||||
: @@snip [SimpleClusterListener.java](/akka-docs/src/test/java/jdocs/cluster/SimpleClusterListener.java) { type=java }
|
: @@snip [SimpleClusterListener.java](/akka-docs/src/test/java/jdocs/cluster/SimpleClusterListener.java) { type=java }
|
||||||
|
|
||||||
|
And the minimum configuration required is to set a host/port for remoting and the `akka.actor.provider = "cluster"`.
|
||||||
|
|
||||||
|
@@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #config-seeds }
|
||||||
|
|
||||||
The actor registers itself as subscriber of certain cluster events. It receives events corresponding to the current state
|
The actor registers itself as subscriber of certain cluster events. It receives events corresponding to the current state
|
||||||
of the cluster when the subscription starts and then it receives events for changes that happen in the cluster.
|
of the cluster when the subscription starts and then it receives events for changes that happen in the cluster.
|
||||||
|
|
||||||
The easiest way to run this example yourself is to try the
|
## Cluster Membership API
|
||||||
@scala[@extref[Akka Cluster Sample with Scala](samples:akka-samples-cluster-scala)]@java[@extref[Akka Cluster Sample with Java](samples:akka-samples-cluster-java)].
|
|
||||||
It contains instructions on how to run the `SimpleClusterApp`.
|
|
||||||
|
|
||||||
## Joining to Seed Nodes
|
This section shows the basic usage of the membership API. For the in-depth descriptions on joining, joining to seed nodes, downing and leaving of any node in the cluster please see the full
|
||||||
|
@ref:[Cluster Membership API](typed/cluster.md#cluster-membership-api) documentation.
|
||||||
|
|
||||||
@@@ note
|
If not using configuration to specify [seed nodes to join](#joining-to-seed-nodes), joining the cluster can be done programmatically:
|
||||||
When starting clusters on cloud systems such as Kubernetes, AWS, Google Cloud, Azure, Mesos or others which maintain
|
|
||||||
DNS or other ways of discovering nodes, you may want to use the automatic joining process implemented by the open source
|
|
||||||
[Akka Cluster Bootstrap](https://doc.akka.io/docs/akka-management/current/bootstrap/index.html) module.
|
|
||||||
@@@
|
|
||||||
|
|
||||||
### Joining configured seed nodes
|
Scala
|
||||||
|
: @@snip [SimpleClusterListener2.scala](/akka-docs/src/test/scala/docs/cluster/SimpleClusterListener2.scala) { #join }
|
||||||
|
|
||||||
You may decide if joining to the cluster should be done manually or automatically
|
Java
|
||||||
to configured initial contact points, so-called seed nodes. After the joining process
|
: @@snip [SimpleClusterListener2.java](/akka-docs/src/test/java/jdocs/cluster/SimpleClusterListener2.java) { #join }
|
||||||
the seed nodes are not special and they participate in the cluster in exactly the same
|
|
||||||
way as other nodes.
|
|
||||||
|
|
||||||
When a new node is started it sends a message to all seed nodes and then sends join command to the one that
|
|
||||||
answers first. If no one of the seed nodes replied (might not be started yet)
|
|
||||||
it retries this procedure until successful or shutdown.
|
|
||||||
|
|
||||||
You define the seed nodes in the [configuration](#cluster-configuration) file (application.conf):
|
Leaving the cluster and downing a node are similar, for example:
|
||||||
|
|
||||||
|
Scala
|
||||||
|
: @@snip [ClusterDocSpec.scala](/akka-docs/src/test/scala/docs/cluster/ClusterDocSpec.scala) { #leave }
|
||||||
|
|
||||||
```
|
Java
|
||||||
akka.cluster.seed-nodes = [
|
: @@snip [ClusterDocTest.java](/akka-docs/src/test/java/jdocs/cluster/ClusterDocTest.java) { #leave }
|
||||||
"akka://ClusterSystem@host1:2552",
|
|
||||||
"akka://ClusterSystem@host2:2552"]
|
|
||||||
```
|
### Joining to Seed Nodes
|
||||||
|
|
||||||
This can also be defined as Java system properties when starting the JVM using the following syntax:
|
Joining to initial cluster contact points `akka.cluster.seed-nodes` can be done manually, with @ref:[configuration](typed/cluster.md#joining-configured-seed-nodes),
|
||||||
|
[programatically](#programatically-joining-to-seed-nodes-with-joinseednodes), or @ref:[automatically with Cluster Bootstrap](typed/cluster.md#joining-automatically-to-seed-nodes-with-cluster-bootstrap).
|
||||||
|
|
||||||
|
#### Programatically joining to seed nodes with `joinSeedNodes`
|
||||||
|
|
||||||
```
|
@@include[cluster.md](includes/cluster.md) { #join-seeds-programmatic }
|
||||||
-Dakka.cluster.seed-nodes.0=akka://ClusterSystem@host1:2552
|
|
||||||
-Dakka.cluster.seed-nodes.1=akka://ClusterSystem@host2:2552
|
|
||||||
```
|
|
||||||
|
|
||||||
The seed nodes can be started in any order and it is not necessary to have all
|
|
||||||
seed nodes running, but the node configured as the **first element** in the `seed-nodes`
|
|
||||||
configuration list must be started when initially starting a cluster, otherwise the
|
|
||||||
other seed-nodes will not become initialized and no other node can join the cluster.
|
|
||||||
The reason for the special first seed node is to avoid forming separated islands when
|
|
||||||
starting from an empty cluster.
|
|
||||||
It is quickest to start all configured seed nodes at the same time (order doesn't matter),
|
|
||||||
otherwise it can take up to the configured `seed-node-timeout` until the nodes
|
|
||||||
can join.
|
|
||||||
|
|
||||||
Once more than two seed nodes have been started it is no problem to shut down the first
|
|
||||||
seed node. If the first seed node is restarted, it will first try to join the other
|
|
||||||
seed nodes in the existing cluster. Note that if you stop all seed nodes at the same time
|
|
||||||
and restart them with the same `seed-nodes` configuration they will join themselves and
|
|
||||||
form a new cluster instead of joining remaining nodes of the existing cluster. That is
|
|
||||||
likely not desired and should be avoided by listing several nodes as seed nodes for redundancy
|
|
||||||
and don't stop all of them at the same time.
|
|
||||||
|
|
||||||
### Automatically joining to seed nodes with Cluster Bootstrap
|
|
||||||
|
|
||||||
Instead of manually configuring seed nodes, which is useful in development or statically assigned node IPs, you may want
|
|
||||||
to automate the discovery of seed nodes using your cloud providers or cluster orchestrator, or some other form of service
|
|
||||||
discovery (such as managed DNS). The open source Akka Management library includes the
|
|
||||||
[Cluster Bootstrap](https://doc.akka.io/docs/akka-management/current/bootstrap/index.html) module which handles
|
|
||||||
just that. Please refer to its documentation for more details.
|
|
||||||
|
|
||||||
### Programatically joining to seed nodes with `joinSeedNodes`
|
|
||||||
|
|
||||||
You may also use @scala[`Cluster(system).joinSeedNodes`]@java[`Cluster.get(system).joinSeedNodes`] to join programmatically,
|
|
||||||
which is attractive when dynamically discovering other nodes at startup by using some external tool or API.
|
|
||||||
When using `joinSeedNodes` you should not include the node itself except for the node that is
|
|
||||||
supposed to be the first seed node, and that should be placed first in the parameter to
|
|
||||||
`joinSeedNodes`.
|
|
||||||
|
|
||||||
Scala
|
Scala
|
||||||
: @@snip [ClusterDocSpec.scala](/akka-docs/src/test/scala/docs/cluster/ClusterDocSpec.scala) { #join-seed-nodes }
|
: @@snip [ClusterDocSpec.scala](/akka-docs/src/test/scala/docs/cluster/ClusterDocSpec.scala) { #join-seed-nodes }
|
||||||
|
|
@ -238,167 +99,7 @@ Scala
|
||||||
Java
|
Java
|
||||||
: @@snip [ClusterDocTest.java](/akka-docs/src/test/java/jdocs/cluster/ClusterDocTest.java) { #join-seed-nodes-imports #join-seed-nodes }
|
: @@snip [ClusterDocTest.java](/akka-docs/src/test/java/jdocs/cluster/ClusterDocTest.java) { #join-seed-nodes-imports #join-seed-nodes }
|
||||||
|
|
||||||
Unsuccessful attempts to contact seed nodes are automatically retried after the time period defined in
|
For more information see @ref[tuning joins](typed/cluster.md#tuning-joins)
|
||||||
configuration property `seed-node-timeout`. Unsuccessful attempt to join a specific seed node is
|
|
||||||
automatically retried after the configured `retry-unsuccessful-join-after`. Retrying means that it
|
|
||||||
tries to contact all seed nodes and then joins the node that answers first. The first node in the list
|
|
||||||
of seed nodes will join itself if it cannot contact any of the other seed nodes within the
|
|
||||||
configured `seed-node-timeout`.
|
|
||||||
|
|
||||||
The joining of given seed nodes will by default be retried indefinitely until
|
|
||||||
a successful join. That process can be aborted if unsuccessful by configuring a
|
|
||||||
timeout. When aborted it will run @ref:[Coordinated Shutdown](actors.md#coordinated-shutdown),
|
|
||||||
which by default will terminate the ActorSystem. CoordinatedShutdown can also be configured to exit
|
|
||||||
the JVM. It is useful to define this timeout if the `seed-nodes` are assembled
|
|
||||||
dynamically and a restart with new seed-nodes should be tried after unsuccessful
|
|
||||||
attempts.
|
|
||||||
|
|
||||||
```
|
|
||||||
akka.cluster.shutdown-after-unsuccessful-join-seed-nodes = 20s
|
|
||||||
akka.coordinated-shutdown.terminate-actor-system = on
|
|
||||||
```
|
|
||||||
|
|
||||||
If you don't configure seed nodes or use `joinSeedNodes` you need to join the cluster manually, which can be performed by using [JMX](#cluster-jmx) or [HTTP](#cluster-http).
|
|
||||||
|
|
||||||
You can join to any node in the cluster. It does not have to be configured as a seed node.
|
|
||||||
Note that you can only join to an existing cluster member, which means that for bootstrapping some
|
|
||||||
node must join itself,and then the following nodes could join them to make up a cluster.
|
|
||||||
|
|
||||||
An actor system can only join a cluster once. Additional attempts will be ignored.
|
|
||||||
When it has successfully joined it must be restarted to be able to join another
|
|
||||||
cluster or to join the same cluster again. It can use the same host name and port
|
|
||||||
after the restart, when it come up as new incarnation of existing member in the cluster,
|
|
||||||
trying to join in, then the existing one will be removed from the cluster and then it will
|
|
||||||
be allowed to join.
|
|
||||||
|
|
||||||
@@@ note
|
|
||||||
|
|
||||||
The name of the `ActorSystem` must be the same for all members of a cluster. The name is given when you start the `ActorSystem`.
|
|
||||||
|
|
||||||
@@@
|
|
||||||
|
|
||||||
<a id="automatic-vs-manual-downing"></a>
|
|
||||||
## Downing
|
|
||||||
|
|
||||||
When a member is considered by the failure detector to be unreachable the
|
|
||||||
leader is not allowed to perform its duties, such as changing status of
|
|
||||||
new joining members to 'Up'. The node must first become reachable again, or the
|
|
||||||
status of the unreachable member must be changed to 'Down'. Changing status to 'Down'
|
|
||||||
can be performed automatically or manually. By default it must be done manually, using
|
|
||||||
[JMX](#cluster-jmx) or [HTTP](#cluster-http).
|
|
||||||
|
|
||||||
It can also be performed programmatically with @scala[`Cluster(system).down(address)`]@java[`Cluster.get(system).down(address)`].
|
|
||||||
|
|
||||||
If a node is still running and sees its self as Down it will shutdown. @ref:[Coordinated Shutdown](actors.md#coordinated-shutdown) will automatically
|
|
||||||
run if `run-coordinated-shutdown-when-down` is set to `on` (the default) however the node will not try
|
|
||||||
and leave the cluster gracefully so sharding and singleton migration will not occur.
|
|
||||||
|
|
||||||
A pre-packaged solution for the downing problem is provided by
|
|
||||||
[Split Brain Resolver](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html),
|
|
||||||
which is part of the [Lightbend Reactive Platform](http://www.lightbend.com/platform).
|
|
||||||
If you don’t use RP, you should anyway carefully read the [documentation](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html)
|
|
||||||
of the Split Brain Resolver and make sure that the solution you are using handles the concerns
|
|
||||||
described there.
|
|
||||||
|
|
||||||
### Auto-downing (DO NOT USE)
|
|
||||||
|
|
||||||
There is an automatic downing feature that you should not use in production. For testing purpose you can enable it with configuration:
|
|
||||||
|
|
||||||
```
|
|
||||||
akka.cluster.auto-down-unreachable-after = 120s
|
|
||||||
```
|
|
||||||
|
|
||||||
This means that the cluster leader member will change the `unreachable` node
|
|
||||||
status to `down` automatically after the configured time of unreachability.
|
|
||||||
|
|
||||||
This is a naïve approach to remove unreachable nodes from the cluster membership.
|
|
||||||
It can be useful during development but in a production environment it will eventually breakdown the cluster. When a network partition occurs, both sides of the
|
|
||||||
partition will see the other side as unreachable and remove it from the cluster.
|
|
||||||
This results in the formation of two separate, disconnected, clusters
|
|
||||||
(known as *Split Brain*).
|
|
||||||
|
|
||||||
This behaviour is not limited to network partitions. It can also occur if a node
|
|
||||||
in the cluster is overloaded, or experiences a long GC pause.
|
|
||||||
|
|
||||||
@@@ warning
|
|
||||||
|
|
||||||
We recommend against using the auto-down feature of Akka Cluster in production. It
|
|
||||||
has multiple undesirable consequences for production systems.
|
|
||||||
|
|
||||||
If you are using @ref:[Cluster Singleton](cluster-singleton.md) or
|
|
||||||
@ref:[Cluster Sharding](cluster-sharding.md) it can break the contract provided by
|
|
||||||
those features. Both provide a guarantee that an actor will be unique in a cluster.
|
|
||||||
With the auto-down feature enabled, it is possible for multiple independent clusters
|
|
||||||
to form (*Split Brain*). When this happens the guaranteed uniqueness will no
|
|
||||||
longer be true resulting in undesirable behaviour in the system.
|
|
||||||
|
|
||||||
This is even more severe when @ref:[Akka Persistence](persistence.md) is used in
|
|
||||||
conjunction with Cluster Sharding. In this case, the lack of unique actors can
|
|
||||||
cause multiple actors to write to the same journal. Akka Persistence operates on a
|
|
||||||
single writer principle. Having multiple writers will corrupt the journal
|
|
||||||
and make it unusable.
|
|
||||||
|
|
||||||
Finally, even if you don't use features such as Persistence, Sharding, or Singletons,
|
|
||||||
auto-downing can lead the system to form multiple small clusters. These small
|
|
||||||
clusters will be independent from each other. They will be unable to communicate
|
|
||||||
and as a result you may experience performance degradation. Once this condition
|
|
||||||
occurs, it will require manual intervention in order to reform the cluster.
|
|
||||||
|
|
||||||
Because of these issues, auto-downing should **never** be used in a production environment.
|
|
||||||
|
|
||||||
@@@
|
|
||||||
|
|
||||||
## Leaving
|
|
||||||
|
|
||||||
There are two ways to remove a member from the cluster.
|
|
||||||
|
|
||||||
You can stop the actor system (or the JVM process). It will be detected
|
|
||||||
as unreachable and removed after the automatic or manual downing as described
|
|
||||||
above.
|
|
||||||
|
|
||||||
A more graceful exit can be performed if you tell the cluster that a node shall leave.
|
|
||||||
This can be performed using [JMX](#cluster-jmx) or [HTTP](#cluster-http).
|
|
||||||
It can also be performed programmatically with:
|
|
||||||
|
|
||||||
Scala
|
|
||||||
: @@snip [ClusterDocSpec.scala](/akka-docs/src/test/scala/docs/cluster/ClusterDocSpec.scala) { #leave }
|
|
||||||
|
|
||||||
Java
|
|
||||||
: @@snip [ClusterDocTest.java](/akka-docs/src/test/java/jdocs/cluster/ClusterDocTest.java) { #leave }
|
|
||||||
|
|
||||||
Note that this command can be issued to any member in the cluster, not necessarily the
|
|
||||||
one that is leaving.
|
|
||||||
|
|
||||||
The @ref:[Coordinated Shutdown](actors.md#coordinated-shutdown) will automatically run when the cluster node sees itself as
|
|
||||||
`Exiting`, i.e. leaving from another node will trigger the shutdown process on the leaving node.
|
|
||||||
Tasks for graceful leaving of cluster including graceful shutdown of Cluster Singletons and
|
|
||||||
Cluster Sharding are added automatically when Akka Cluster is used, i.e. running the shutdown
|
|
||||||
process will also trigger the graceful leaving if it's not already in progress.
|
|
||||||
|
|
||||||
Normally this is handled automatically, but in case of network failures during this process it might still
|
|
||||||
be necessary to set the node’s status to `Down` in order to complete the removal.
|
|
||||||
|
|
||||||
<a id="weakly-up"></a>
|
|
||||||
## WeaklyUp Members
|
|
||||||
|
|
||||||
If a node is `unreachable` then gossip convergence is not possible and therefore any
|
|
||||||
`leader` actions are also not possible. However, we still might want new nodes to join
|
|
||||||
the cluster in this scenario.
|
|
||||||
|
|
||||||
`Joining` members will be promoted to `WeaklyUp` and become part of the cluster if
|
|
||||||
convergence can't be reached. Once gossip convergence is reached, the leader will move `WeaklyUp`
|
|
||||||
members to `Up`.
|
|
||||||
|
|
||||||
This feature is enabled by default, but it can be disabled with configuration option:
|
|
||||||
|
|
||||||
```
|
|
||||||
akka.cluster.allow-weakly-up-members = off
|
|
||||||
```
|
|
||||||
|
|
||||||
You can subscribe to the `WeaklyUp` membership event to make use of the members that are
|
|
||||||
in this state, but you should be aware of that members on the other side of a network partition
|
|
||||||
have no knowledge about the existence of the new members. You should for example not count
|
|
||||||
`WeaklyUp` members in quorum decisions.
|
|
||||||
|
|
||||||
<a id="cluster-subscriber"></a>
|
<a id="cluster-subscriber"></a>
|
||||||
## Subscribe to Cluster Events
|
## Subscribe to Cluster Events
|
||||||
|
|
@ -451,26 +152,6 @@ Scala
|
||||||
Java
|
Java
|
||||||
: @@snip [SimpleClusterListener.java](/akka-docs/src/test/java/jdocs/cluster/SimpleClusterListener.java) { #subscribe }
|
: @@snip [SimpleClusterListener.java](/akka-docs/src/test/java/jdocs/cluster/SimpleClusterListener.java) { #subscribe }
|
||||||
|
|
||||||
The events to track the life-cycle of members are:
|
|
||||||
|
|
||||||
* `ClusterEvent.MemberJoined` - A new member has joined the cluster and its status has been changed to `Joining`
|
|
||||||
* `ClusterEvent.MemberUp` - A new member has joined the cluster and its status has been changed to `Up`
|
|
||||||
* `ClusterEvent.MemberExited` - A member is leaving the cluster and its status has been changed to `Exiting`
|
|
||||||
Note that the node might already have been shutdown when this event is published on another node.
|
|
||||||
* `ClusterEvent.MemberRemoved` - Member completely removed from the cluster.
|
|
||||||
* `ClusterEvent.UnreachableMember` - A member is considered as unreachable, detected by the failure detector
|
|
||||||
of at least one other node.
|
|
||||||
* `ClusterEvent.ReachableMember` - A member is considered as reachable again, after having been unreachable.
|
|
||||||
All nodes that previously detected it as unreachable has detected it as reachable again.
|
|
||||||
|
|
||||||
There are more types of change events, consult the API documentation
|
|
||||||
of classes that extends `akka.cluster.ClusterEvent.ClusterDomainEvent`
|
|
||||||
for details about the events.
|
|
||||||
|
|
||||||
Instead of subscribing to cluster events it can sometimes be convenient to only get the full membership state with
|
|
||||||
@scala[`Cluster(system).state`]@java[`Cluster.get(system).state()`]. Note that this state is not necessarily in sync with the events published to a
|
|
||||||
cluster subscription.
|
|
||||||
|
|
||||||
### Worker Dial-in Example
|
### Worker Dial-in Example
|
||||||
|
|
||||||
Let's take a look at an example that illustrates how workers, here named *backend*,
|
Let's take a look at an example that illustrates how workers, here named *backend*,
|
||||||
|
|
@ -520,17 +201,10 @@ unreachable cluster node has been downed and removed.
|
||||||
The easiest way to run **Worker Dial-in Example** example yourself is to try the
|
The easiest way to run **Worker Dial-in Example** example yourself is to try the
|
||||||
@scala[@extref[Akka Cluster Sample with Scala](samples:akka-samples-cluster-scala)]@java[@extref[Akka Cluster Sample with Java](samples:akka-samples-cluster-java)].
|
@scala[@extref[Akka Cluster Sample with Scala](samples:akka-samples-cluster-scala)]@java[@extref[Akka Cluster Sample with Java](samples:akka-samples-cluster-java)].
|
||||||
It contains instructions on how to run the **Worker Dial-in Example** sample.
|
It contains instructions on how to run the **Worker Dial-in Example** sample.
|
||||||
|
|
||||||
## Node Roles
|
## Node Roles
|
||||||
|
|
||||||
Not all nodes of a cluster need to perform the same function: there might be one sub-set which runs the web front-end,
|
See @ref:[Cluster Node Roles](typed/cluster.md#node-roles)
|
||||||
one which runs the data access layer and one for the number-crunching. Deployment of actors—for example by cluster-aware
|
|
||||||
routers—can take node roles into account to achieve this distribution of responsibilities.
|
|
||||||
|
|
||||||
The roles of a node is defined in the configuration property named `akka.cluster.roles`
|
|
||||||
and it is typically defined in the start script as a system property or environment variable.
|
|
||||||
|
|
||||||
The roles of the nodes is part of the membership information in `MemberEvent` that you can subscribe to.
|
|
||||||
|
|
||||||
<a id="min-members"></a>
|
<a id="min-members"></a>
|
||||||
## How To Startup when Cluster Size Reached
|
## How To Startup when Cluster Size Reached
|
||||||
|
|
@ -555,9 +229,11 @@ akka.cluster.role {
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
You can start the actors in a `registerOnMemberUp` callback, which will
|
## How To Startup when Member is Up
|
||||||
be invoked when the current member status is changed to 'Up', i.e. the cluster
|
|
||||||
has at least the defined number of members.
|
You can start actors or trigger any functions using the `registerOnMemberUp` callback, which will
|
||||||
|
be invoked when the current member status is changed to 'Up'. This can additionally be used with
|
||||||
|
`akka.cluster.min-nr-of-members` optional configuration to defer an action until the cluster has reached a certain size.
|
||||||
|
|
||||||
Scala
|
Scala
|
||||||
: @@snip [FactorialFrontend.scala](/akka-docs/src/test/scala/docs/cluster/FactorialFrontend.scala) { #registerOnUp }
|
: @@snip [FactorialFrontend.scala](/akka-docs/src/test/scala/docs/cluster/FactorialFrontend.scala) { #registerOnUp }
|
||||||
|
|
@ -565,8 +241,6 @@ Scala
|
||||||
Java
|
Java
|
||||||
: @@snip [FactorialFrontendMain.java](/akka-docs/src/test/java/jdocs/cluster/FactorialFrontendMain.java) { #registerOnUp }
|
: @@snip [FactorialFrontendMain.java](/akka-docs/src/test/java/jdocs/cluster/FactorialFrontendMain.java) { #registerOnUp }
|
||||||
|
|
||||||
This callback can be used for other things than starting actors.
|
|
||||||
|
|
||||||
## How To Cleanup when Member is Removed
|
## How To Cleanup when Member is Removed
|
||||||
|
|
||||||
You can do some clean up in a `registerOnMemberRemoved` callback, which will
|
You can do some clean up in a `registerOnMemberRemoved` callback, which will
|
||||||
|
|
@ -585,139 +259,55 @@ down when you installing, and depending on the race is not healthy.
|
||||||
|
|
||||||
## Higher level Cluster tools
|
## Higher level Cluster tools
|
||||||
|
|
||||||
### Cluster Singleton
|
@@include[cluster.md](includes/cluster.md) { #cluster-singleton }
|
||||||
|
See @ref:[Cluster Singleton](cluster-singleton.md).
|
||||||
For some use cases it is convenient and sometimes also mandatory to ensure that
|
|
||||||
you have exactly one actor of a certain type running somewhere in the cluster.
|
|
||||||
|
|
||||||
This can be implemented by subscribing to member events, but there are several corner
|
|
||||||
cases to consider. Therefore, this specific use case is covered by the
|
|
||||||
@ref:[Cluster Singleton](cluster-singleton.md).
|
|
||||||
|
|
||||||
### Cluster Sharding
|
|
||||||
|
|
||||||
Distributes actors across several nodes in the cluster and supports interaction
|
|
||||||
with the actors using their logical identifier, but without having to care about
|
|
||||||
their physical location in the cluster.
|
|
||||||
|
|
||||||
|
@@include[cluster.md](includes/cluster.md) { #cluster-sharding }
|
||||||
See @ref:[Cluster Sharding](cluster-sharding.md).
|
See @ref:[Cluster Sharding](cluster-sharding.md).
|
||||||
|
|
||||||
|
@@include[cluster.md](includes/cluster.md) { #cluster-ddata }
|
||||||
|
See @ref:[Distributed Data](distributed-data.md).
|
||||||
|
|
||||||
### Distributed Publish Subscribe
|
@@include[cluster.md](includes/cluster.md) { #cluster-pubsub }
|
||||||
|
See @ref:[Cluster Distributed Publish Subscribe](distributed-pub-sub.md).
|
||||||
|
|
||||||
Publish-subscribe messaging between actors in the cluster, and point-to-point messaging
|
### Cluster Aware Routers
|
||||||
using the logical path of the actors, i.e. the sender does not have to know on which
|
|
||||||
node the destination actor is running.
|
|
||||||
|
|
||||||
See @ref:[Distributed Publish Subscribe in Cluster](distributed-pub-sub.md).
|
|
||||||
|
|
||||||
|
All routers can be made aware of member nodes in the cluster, i.e.
|
||||||
|
deploying new routees or looking up routees on nodes in the cluster.
|
||||||
|
When a node becomes unreachable or leaves the cluster the routees of that node are
|
||||||
|
automatically unregistered from the router. When new nodes join the cluster, additional
|
||||||
|
routees are added to the router, according to the configuration.
|
||||||
|
|
||||||
|
See @ref:[Cluster Aware Routers](cluster-routing.md) and @ref:[Routers](routing.md).
|
||||||
|
|
||||||
|
@@include[cluster.md](includes/cluster.md) { #cluster-multidc }
|
||||||
|
See @ref:[Cluster Multi-DC](cluster-dc.md).
|
||||||
|
|
||||||
### Cluster Client
|
### Cluster Client
|
||||||
|
|
||||||
Communication from an actor system that is not part of the cluster to actors running
|
Communication from an actor system that is not part of the cluster to actors running
|
||||||
somewhere in the cluster. The client does not have to know on which node the destination
|
somewhere in the cluster. The client does not have to know on which node the destination
|
||||||
actor is running.
|
actor is running.
|
||||||
|
|
||||||
See @ref:[Cluster Client](cluster-client.md).
|
See @ref:[Cluster Client](cluster-client.md).
|
||||||
|
|
||||||
### Distributed Data
|
|
||||||
|
|
||||||
*Akka Distributed Data* is useful when you need to share data between nodes in an
|
|
||||||
Akka Cluster. The data is accessed with an actor providing a key-value store like API.
|
|
||||||
|
|
||||||
See @ref:[Distributed Data](distributed-data.md).
|
|
||||||
|
|
||||||
### Cluster Aware Routers
|
|
||||||
|
|
||||||
All @ref:[routers](routing.md) can be made aware of member nodes in the cluster, i.e.
|
|
||||||
deploying new routees or looking up routees on nodes in the cluster.
|
|
||||||
When a node becomes unreachable or leaves the cluster the routees of that node are
|
|
||||||
automatically unregistered from the router. When new nodes join the cluster, additional
|
|
||||||
routees are added to the router, according to the configuration.
|
|
||||||
|
|
||||||
See @ref:[Cluster Aware Routers](cluster-routing.md).
|
|
||||||
|
|
||||||
### Cluster Metrics
|
### Cluster Metrics
|
||||||
|
|
||||||
The member nodes of the cluster can collect system health metrics and publish that to other cluster nodes
|
The member nodes of the cluster can collect system health metrics and publish that to other cluster nodes
|
||||||
and to the registered subscribers on the system event bus.
|
and to the registered subscribers on the system event bus.
|
||||||
|
|
||||||
See @ref:[Cluster Metrics](cluster-metrics.md).
|
See @ref:[Cluster Metrics](cluster-metrics.md).
|
||||||
|
|
||||||
## Failure Detector
|
## Failure Detector
|
||||||
|
|
||||||
In a cluster each node is monitored by a few (default maximum 5) other nodes, and when
|
|
||||||
any of these detects the node as `unreachable` that information will spread to
|
|
||||||
the rest of the cluster through the gossip. In other words, only one node needs to
|
|
||||||
mark a node `unreachable` to have the rest of the cluster mark that node `unreachable`.
|
|
||||||
|
|
||||||
The failure detector will also detect if the node becomes `reachable` again. When
|
|
||||||
all nodes that monitored the `unreachable` node detects it as `reachable` again
|
|
||||||
the cluster, after gossip dissemination, will consider it as `reachable`.
|
|
||||||
|
|
||||||
If system messages cannot be delivered to a node it will be quarantined and then it
|
|
||||||
cannot come back from `unreachable`. This can happen if the there are too many
|
|
||||||
unacknowledged system messages (e.g. watch, Terminated, remote actor deployment,
|
|
||||||
failures of actors supervised by remote parent). Then the node needs to be moved
|
|
||||||
to the `down` or `removed` states and the actor system of the quarantined node
|
|
||||||
must be restarted before it can join the cluster again.
|
|
||||||
|
|
||||||
The nodes in the cluster monitor each other by sending heartbeats to detect if a node is
|
The nodes in the cluster monitor each other by sending heartbeats to detect if a node is
|
||||||
unreachable from the rest of the cluster. The heartbeat arrival times is interpreted
|
unreachable from the rest of the cluster. Please see:
|
||||||
by an implementation of
|
|
||||||
[The Phi Accrual Failure Detector](http://www.jaist.ac.jp/~defago/files/pdf/IS_RR_2004_010.pdf).
|
|
||||||
|
|
||||||
The suspicion level of failure is given by a value called *phi*.
|
|
||||||
The basic idea of the phi failure detector is to express the value of *phi* on a scale that
|
|
||||||
is dynamically adjusted to reflect current network conditions.
|
|
||||||
|
|
||||||
The value of *phi* is calculated as:
|
|
||||||
|
|
||||||
```
|
|
||||||
phi = -log10(1 - F(timeSinceLastHeartbeat))
|
|
||||||
```
|
|
||||||
|
|
||||||
where F is the cumulative distribution function of a normal distribution with mean
|
|
||||||
and standard deviation estimated from historical heartbeat inter-arrival times.
|
|
||||||
|
|
||||||
In the [configuration](#cluster-configuration) you can adjust the `akka.cluster.failure-detector.threshold`
|
|
||||||
to define when a *phi* value is considered to be a failure.
|
|
||||||
|
|
||||||
A low `threshold` is prone to generate many false positives but ensures
|
|
||||||
a quick detection in the event of a real crash. Conversely, a high `threshold`
|
|
||||||
generates fewer mistakes but needs more time to detect actual crashes. The
|
|
||||||
default `threshold` is 8 and is appropriate for most situations. However in
|
|
||||||
cloud environments, such as Amazon EC2, the value could be increased to 12 in
|
|
||||||
order to account for network issues that sometimes occur on such platforms.
|
|
||||||
|
|
||||||
The following chart illustrates how *phi* increase with increasing time since the
|
|
||||||
previous heartbeat.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Phi is calculated from the mean and standard deviation of historical
|
|
||||||
inter arrival times. The previous chart is an example for standard deviation
|
|
||||||
of 200 ms. If the heartbeats arrive with less deviation the curve becomes steeper,
|
|
||||||
i.e. it is possible to determine failure more quickly. The curve looks like this for
|
|
||||||
a standard deviation of 100 ms.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
To be able to survive sudden abnormalities, such as garbage collection pauses and
|
|
||||||
transient network failures the failure detector is configured with a margin,
|
|
||||||
`akka.cluster.failure-detector.acceptable-heartbeat-pause`. You may want to
|
|
||||||
adjust the [configuration](#cluster-configuration) of this depending on your environment.
|
|
||||||
This is how the curve looks like for `acceptable-heartbeat-pause` configured to
|
|
||||||
3 seconds.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Death watch uses the cluster failure detector for nodes in the cluster, i.e. it detects
|
|
||||||
network failures and JVM crashes, in addition to graceful termination of watched
|
|
||||||
actor. Death watch generates the `Terminated` message to the watching actor when the
|
|
||||||
unreachable cluster node has been downed and removed.
|
|
||||||
|
|
||||||
If you encounter suspicious false positives when the system is under load you should
|
|
||||||
define a separate dispatcher for the cluster actors as described in [Cluster Dispatcher](#cluster-dispatcher).
|
|
||||||
|
|
||||||
|
* @ref:[Failure Detector specification](typed/cluster-specification.md#failure-detector)
|
||||||
|
* @ref:[Phi Accrual Failure Detector](typed/failure-detector.md) implementation
|
||||||
|
* [Using the Failure Detector](typed/cluster.md#using-the-failure-detector)
|
||||||
|
|
||||||
## How to Test
|
## How to Test
|
||||||
|
|
||||||
@@@ div { .group-scala }
|
@@@ div { .group-scala }
|
||||||
|
|
@ -728,7 +318,7 @@ Set up your project according to the instructions in @ref:[Multi Node Testing](m
|
||||||
add the `sbt-multi-jvm` plugin and the dependency to `akka-multi-node-testkit`.
|
add the `sbt-multi-jvm` plugin and the dependency to `akka-multi-node-testkit`.
|
||||||
|
|
||||||
First, as described in @ref:[Multi Node Testing](multi-node-testing.md), we need some scaffolding to configure the `MultiNodeSpec`.
|
First, as described in @ref:[Multi Node Testing](multi-node-testing.md), we need some scaffolding to configure the `MultiNodeSpec`.
|
||||||
Define the participating roles and their [configuration](#cluster-configuration) in an object extending `MultiNodeConfig`:
|
Define the participating @ref:[roles](typed/cluster.md#node-roles) and their [configuration](#configuration) in an object extending `MultiNodeConfig`:
|
||||||
|
|
||||||
@@snip [StatsSampleSpec.scala](/akka-cluster-metrics/src/multi-jvm/scala/akka/cluster/metrics/sample/StatsSampleSpec.scala) { #MultiNodeConfig }
|
@@snip [StatsSampleSpec.scala](/akka-cluster-metrics/src/multi-jvm/scala/akka/cluster/metrics/sample/StatsSampleSpec.scala) { #MultiNodeConfig }
|
||||||
|
|
||||||
|
|
@ -783,29 +373,9 @@ Go to the corresponding Scala version of this page for details.
|
||||||
|
|
||||||
## Management
|
## Management
|
||||||
|
|
||||||
<a id="cluster-http"></a>
|
There are several management tools for the cluster. Please refer to the
|
||||||
### HTTP
|
@ref:[Cluster Management](additional/operations.md#cluster-management) for more information.
|
||||||
|
|
||||||
Information and management of the cluster is available with a HTTP API.
|
|
||||||
See documentation of [Akka Management](http://developer.lightbend.com/docs/akka-management/current/).
|
|
||||||
|
|
||||||
<a id="cluster-jmx"></a>
|
|
||||||
### JMX
|
|
||||||
|
|
||||||
Information and management of the cluster is available as JMX MBeans with the root name `akka.Cluster`.
|
|
||||||
The JMX information can be displayed with an ordinary JMX console such as JConsole or JVisualVM.
|
|
||||||
|
|
||||||
From JMX you can:
|
|
||||||
|
|
||||||
* see what members that are part of the cluster
|
|
||||||
* see status of this node
|
|
||||||
* see roles of each member
|
|
||||||
* join this node to another node in cluster
|
|
||||||
* mark any node in the cluster as down
|
|
||||||
* tell any node in the cluster to leave
|
|
||||||
|
|
||||||
Member nodes are identified by their address, in format *akka.**protocol**://**actor-system-name**@**hostname**:**port***.
|
|
||||||
|
|
||||||
<a id="cluster-command-line"></a>
|
<a id="cluster-command-line"></a>
|
||||||
### Command Line
|
### Command Line
|
||||||
|
|
||||||
|
|
@ -850,52 +420,8 @@ as described in [Monitoring and Management Using JMX Technology](http://docs.ora
|
||||||
Make sure you understand the security implications of enabling remote monitoring and management.
|
Make sure you understand the security implications of enabling remote monitoring and management.
|
||||||
|
|
||||||
<a id="cluster-configuration"></a>
|
<a id="cluster-configuration"></a>
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
There are several configuration properties for the cluster. We refer to the
|
There are several @ref:[configuration](typed/cluster.md#configuration) properties for the cluster,
|
||||||
@ref:[reference configuration](general/configuration.md#config-akka-cluster) for more information.
|
and the full @ref:[reference configuration](general/configuration.md#config-akka-cluster) for complete information.
|
||||||
|
|
||||||
### Cluster Info Logging
|
|
||||||
|
|
||||||
You can silence the logging of cluster events at info level with configuration property:
|
|
||||||
|
|
||||||
```
|
|
||||||
akka.cluster.log-info = off
|
|
||||||
```
|
|
||||||
|
|
||||||
You can enable verbose logging of cluster events at info level, e.g. for temporary troubleshooting, with configuration property:
|
|
||||||
|
|
||||||
```
|
|
||||||
akka.cluster.log-info-verbose = on
|
|
||||||
```
|
|
||||||
|
|
||||||
### Cluster Dispatcher
|
|
||||||
|
|
||||||
Under the hood the cluster extension is implemented with actors. To protect them against
|
|
||||||
disturbance from user actors they are by default run on the internal dispatcher configured
|
|
||||||
under `akka.actor.internal-dispatcher`. The cluster actors can potentially be isolated even
|
|
||||||
further onto their own dispatcher using the setting `akka.cluster.use-dispatcher`.
|
|
||||||
|
|
||||||
### Configuration Compatibility Check
|
|
||||||
|
|
||||||
Creating a cluster is about deploying two or more nodes and make then behave as if they were one single application. Therefore it's extremely important that all nodes in a cluster are configured with compatible settings.
|
|
||||||
|
|
||||||
The Configuration Compatibility Check feature ensures that all nodes in a cluster have a compatible configuration. Whenever a new node is joining an existing cluster, a subset of its configuration settings (only those that are required to be checked) is sent to the nodes in the cluster for verification. Once the configuration is checked on the cluster side, the cluster sends back its own set of required configuration settings. The joining node will then verify if it's compliant with the cluster configuration. The joining node will only proceed if all checks pass, on both sides.
|
|
||||||
|
|
||||||
New custom checkers can be added by extending `akka.cluster.JoinConfigCompatChecker` and including them in the configuration. Each checker must be associated with a unique key:
|
|
||||||
|
|
||||||
```
|
|
||||||
akka.cluster.configuration-compatibility-check.checkers {
|
|
||||||
my-custom-config = "com.company.MyCustomJoinConfigCompatChecker"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
@@@ note
|
|
||||||
|
|
||||||
Configuration Compatibility Check is enabled by default, but can be disabled by setting `akka.cluster.configuration-compatibility-check.enforce-on-join = off`. This is specially useful when performing rolling updates. Obviously this should only be done if a complete cluster shutdown isn't an option. A cluster with nodes with different configuration settings may lead to data loss or data corruption.
|
|
||||||
|
|
||||||
This setting should only be disabled on the joining nodes. The checks are always performed on both sides, and warnings are logged. In case of incompatibilities, it is the responsibility of the joining node to decide if the process should be interrupted or not.
|
|
||||||
|
|
||||||
If you are performing a rolling update on cluster using Akka 2.5.9 or prior (thus, not supporting this feature), the checks will not be performed because the running cluster has no means to verify the configuration sent by the joining node, nor to send back its own configuration.
|
|
||||||
|
|
||||||
@@@
|
|
||||||
|
|
|
||||||
|
|
@ -1,312 +0,0 @@
|
||||||
# Cluster Specification
|
|
||||||
|
|
||||||
@@@ note
|
|
||||||
|
|
||||||
This document describes the design concepts of Akka Cluster.
|
|
||||||
|
|
||||||
@@@
|
|
||||||
|
|
||||||
## Intro
|
|
||||||
|
|
||||||
Akka Cluster provides a fault-tolerant decentralized peer-to-peer based cluster
|
|
||||||
[membership](#membership) service with no single point of failure or single point of bottleneck.
|
|
||||||
It does this using [gossip](#gossip) protocols and an automatic [failure detector](#failure-detector).
|
|
||||||
|
|
||||||
Akka cluster allows for building distributed applications, where one application or service spans multiple nodes
|
|
||||||
(in practice multiple `ActorSystem`s). See also the discussion in
|
|
||||||
@ref:[When and where to use Akka Cluster](../cluster-usage.md#when-and-where-to-use-akka-cluster).
|
|
||||||
|
|
||||||
## Terms
|
|
||||||
|
|
||||||
**node**
|
|
||||||
: A logical member of a cluster. There could be multiple nodes on a physical
|
|
||||||
machine. Defined by a *hostname:port:uid* tuple.
|
|
||||||
|
|
||||||
**cluster**
|
|
||||||
: A set of nodes joined together through the [membership](#membership) service.
|
|
||||||
|
|
||||||
**leader**
|
|
||||||
: A single node in the cluster that acts as the leader. Managing cluster convergence
|
|
||||||
and membership state transitions.
|
|
||||||
|
|
||||||
|
|
||||||
## Membership
|
|
||||||
|
|
||||||
A cluster is made up of a set of member nodes. The identifier for each node is a
|
|
||||||
`hostname:port:uid` tuple. An Akka application can be distributed over a cluster with
|
|
||||||
each node hosting some part of the application. Cluster membership and the actors running
|
|
||||||
on that node of the application are decoupled. A node could be a member of a
|
|
||||||
cluster without hosting any actors. Joining a cluster is initiated
|
|
||||||
by issuing a `Join` command to one of the nodes in the cluster to join.
|
|
||||||
|
|
||||||
The node identifier internally also contains a UID that uniquely identifies this
|
|
||||||
actor system instance at that `hostname:port`. Akka uses the UID to be able to
|
|
||||||
reliably trigger remote death watch. This means that the same actor system can never
|
|
||||||
join a cluster again once it's been removed from that cluster. To re-join an actor
|
|
||||||
system with the same `hostname:port` to a cluster you have to stop the actor system
|
|
||||||
and start a new one with the same `hostname:port` which will then receive a different
|
|
||||||
UID.
|
|
||||||
|
|
||||||
The cluster membership state is a specialized [CRDT](http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf), which means that it has a monotonic
|
|
||||||
merge function. When concurrent changes occur on different nodes the updates can always be
|
|
||||||
merged and converge to the same end result.
|
|
||||||
|
|
||||||
### Gossip
|
|
||||||
|
|
||||||
The cluster membership used in Akka is based on Amazon's [Dynamo](http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) system and
|
|
||||||
particularly the approach taken in Basho's' [Riak](http://basho.com/technology/architecture/) distributed database.
|
|
||||||
Cluster membership is communicated using a [Gossip Protocol](http://en.wikipedia.org/wiki/Gossip_protocol), where the current
|
|
||||||
state of the cluster is gossiped randomly through the cluster, with preference to
|
|
||||||
members that have not seen the latest version.
|
|
||||||
|
|
||||||
#### Vector Clocks
|
|
||||||
|
|
||||||
[Vector clocks](http://en.wikipedia.org/wiki/Vector_clock) are a type of data structure and algorithm for generating a partial
|
|
||||||
ordering of events in a distributed system and detecting causality violations.
|
|
||||||
|
|
||||||
We use vector clocks to reconcile and merge differences in cluster state
|
|
||||||
during gossiping. A vector clock is a set of (node, counter) pairs. Each update
|
|
||||||
to the cluster state has an accompanying update to the vector clock.
|
|
||||||
|
|
||||||
#### Gossip Convergence
|
|
||||||
|
|
||||||
Information about the cluster converges locally at a node at certain points in time.
|
|
||||||
This is when a node can prove that the cluster state he is observing has been observed
|
|
||||||
by all other nodes in the cluster. Convergence is implemented by passing a set of nodes
|
|
||||||
that have seen current state version during gossip. This information is referred to as the
|
|
||||||
seen set in the gossip overview. When all nodes are included in the seen set there is
|
|
||||||
convergence.
|
|
||||||
|
|
||||||
Gossip convergence cannot occur while any nodes are `unreachable`. The nodes need
|
|
||||||
to become `reachable` again, or moved to the `down` and `removed` states
|
|
||||||
(see the [Membership Lifecycle](#membership-lifecycle) section below). This only blocks the leader
|
|
||||||
from performing its cluster membership management and does not influence the application
|
|
||||||
running on top of the cluster. For example this means that during a network partition
|
|
||||||
it is not possible to add more nodes to the cluster. The nodes can join, but they
|
|
||||||
will not be moved to the `up` state until the partition has healed or the unreachable
|
|
||||||
nodes have been downed.
|
|
||||||
|
|
||||||
#### Failure Detector
|
|
||||||
|
|
||||||
The failure detector is responsible for trying to detect if a node is
|
|
||||||
`unreachable` from the rest of the cluster. For this we are using an
|
|
||||||
implementation of [The Phi Accrual Failure Detector](https://pdfs.semanticscholar.org/11ae/4c0c0d0c36dc177c1fff5eb84fa49aa3e1a8.pdf) by Hayashibara et al.
|
|
||||||
|
|
||||||
An accrual failure detector decouples monitoring and interpretation. That makes
|
|
||||||
them applicable to a wider area of scenarios and more adequate to build generic
|
|
||||||
failure detection services. The idea is that it is keeping a history of failure
|
|
||||||
statistics, calculated from heartbeats received from other nodes, and is
|
|
||||||
trying to do educated guesses by taking multiple factors, and how they
|
|
||||||
accumulate over time, into account in order to come up with a better guess if a
|
|
||||||
specific node is up or down. Rather than only answering "yes" or "no" to the
|
|
||||||
question "is the node down?" it returns a `phi` value representing the
|
|
||||||
likelihood that the node is down.
|
|
||||||
|
|
||||||
The `threshold` that is the basis for the calculation is configurable by the
|
|
||||||
user. A low `threshold` is prone to generate many wrong suspicions but ensures
|
|
||||||
a quick detection in the event of a real crash. Conversely, a high `threshold`
|
|
||||||
generates fewer mistakes but needs more time to detect actual crashes. The
|
|
||||||
default `threshold` is 8 and is appropriate for most situations. However in
|
|
||||||
cloud environments, such as Amazon EC2, the value could be increased to 12 in
|
|
||||||
order to account for network issues that sometimes occur on such platforms.
|
|
||||||
|
|
||||||
In a cluster each node is monitored by a few (default maximum 5) other nodes, and when
|
|
||||||
any of these detects the node as `unreachable` that information will spread to
|
|
||||||
the rest of the cluster through the gossip. In other words, only one node needs to
|
|
||||||
mark a node `unreachable` to have the rest of the cluster mark that node `unreachable`.
|
|
||||||
|
|
||||||
The nodes to monitor are picked out of neighbors in a hashed ordered node ring.
|
|
||||||
This is to increase the likelihood to monitor across racks and data centers, but the order
|
|
||||||
is the same on all nodes, which ensures full coverage.
|
|
||||||
|
|
||||||
Heartbeats are sent out every second and every heartbeat is performed in a request/reply
|
|
||||||
handshake with the replies used as input to the failure detector.
|
|
||||||
|
|
||||||
The failure detector will also detect if the node becomes `reachable` again. When
|
|
||||||
all nodes that monitored the `unreachable` node detects it as `reachable` again
|
|
||||||
the cluster, after gossip dissemination, will consider it as `reachable`.
|
|
||||||
|
|
||||||
If system messages cannot be delivered to a node it will be quarantined and then it
|
|
||||||
cannot come back from `unreachable`. This can happen if the there are too many
|
|
||||||
unacknowledged system messages (e.g. watch, Terminated, remote actor deployment,
|
|
||||||
failures of actors supervised by remote parent). Then the node needs to be moved
|
|
||||||
to the `down` or `removed` states (see the [Membership Lifecycle](#membership-lifecycle) section below)
|
|
||||||
and the actor system must be restarted before it can join the cluster again.
|
|
||||||
|
|
||||||
#### Leader
|
|
||||||
|
|
||||||
After gossip convergence a `leader` for the cluster can be determined. There is no
|
|
||||||
`leader` election process, the `leader` can always be recognised deterministically
|
|
||||||
by any node whenever there is gossip convergence. The leader is only a role, any node
|
|
||||||
can be the leader and it can change between convergence rounds.
|
|
||||||
The `leader` is the first node in sorted order that is able to take the leadership role,
|
|
||||||
where the preferred member states for a `leader` are `up` and `leaving`
|
|
||||||
(see the [Membership Lifecycle](#membership-lifecycle) section below for more information about member states).
|
|
||||||
|
|
||||||
The role of the `leader` is to shift members in and out of the cluster, changing
|
|
||||||
`joining` members to the `up` state or `exiting` members to the `removed`
|
|
||||||
state. Currently `leader` actions are only triggered by receiving a new cluster
|
|
||||||
state with gossip convergence.
|
|
||||||
|
|
||||||
The `leader` also has the power, if configured so, to "auto-down" a node that
|
|
||||||
according to the [Failure Detector](#failure-detector) is considered `unreachable`. This means setting
|
|
||||||
the `unreachable` node status to `down` automatically after a configured time
|
|
||||||
of unreachability.
|
|
||||||
|
|
||||||
#### Seed Nodes
|
|
||||||
|
|
||||||
The seed nodes are configured contact points for new nodes joining the cluster.
|
|
||||||
When a new node is started it sends a message to all seed nodes and then sends
|
|
||||||
a join command to the seed node that answers first.
|
|
||||||
|
|
||||||
The seed nodes configuration value does not have any influence on the running
|
|
||||||
cluster itself, it is only relevant for new nodes joining the cluster as it
|
|
||||||
helps them to find contact points to send the join command to; a new member
|
|
||||||
can send this command to any current member of the cluster, not only to the seed nodes.
|
|
||||||
|
|
||||||
#### Gossip Protocol
|
|
||||||
|
|
||||||
A variation of *push-pull gossip* is used to reduce the amount of gossip
|
|
||||||
information sent around the cluster. In push-pull gossip a digest is sent
|
|
||||||
representing current versions but not actual values; the recipient of the gossip
|
|
||||||
can then send back any values for which it has newer versions and also request
|
|
||||||
values for which it has outdated versions. Akka uses a single shared state with
|
|
||||||
a vector clock for versioning, so the variant of push-pull gossip used in Akka
|
|
||||||
makes use of this version to only push the actual state as needed.
|
|
||||||
|
|
||||||
Periodically, the default is every 1 second, each node chooses another random
|
|
||||||
node to initiate a round of gossip with. If less than ½ of the nodes resides in the
|
|
||||||
seen set (have seen the new state) then the cluster gossips 3 times instead of once
|
|
||||||
every second. This adjusted gossip interval is a way to speed up the convergence process
|
|
||||||
in the early dissemination phase after a state change.
|
|
||||||
|
|
||||||
The choice of node to gossip with is random but biased towards nodes that might not have seen
|
|
||||||
the current state version. During each round of gossip exchange, when convergence is not yet reached, a node
|
|
||||||
uses a very high probability (which is configurable) to gossip with another node which is not part of the seen set, i.e.
|
|
||||||
which is likely to have an older version of the state. Otherwise it gossips with any random live node.
|
|
||||||
|
|
||||||
This biased selection is a way to speed up the convergence process in the late dissemination
|
|
||||||
phase after a state change.
|
|
||||||
|
|
||||||
For clusters larger than 400 nodes (configurable, and suggested by empirical evidence)
|
|
||||||
the 0.8 probability is gradually reduced to avoid overwhelming single stragglers with
|
|
||||||
too many concurrent gossip requests. The gossip receiver also has a mechanism to
|
|
||||||
protect itself from too many simultaneous gossip messages by dropping messages that
|
|
||||||
have been enqueued in the mailbox for too long of a time.
|
|
||||||
|
|
||||||
While the cluster is in a converged state the gossiper only sends a small gossip status message containing the gossip
|
|
||||||
version to the chosen node. As soon as there is a change to the cluster (meaning non-convergence)
|
|
||||||
then it goes back to biased gossip again.
|
|
||||||
|
|
||||||
The recipient of the gossip state or the gossip status can use the gossip version
|
|
||||||
(vector clock) to determine whether:
|
|
||||||
|
|
||||||
1. it has a newer version of the gossip state, in which case it sends that back
|
|
||||||
to the gossiper
|
|
||||||
2. it has an outdated version of the state, in which case the recipient requests
|
|
||||||
the current state from the gossiper by sending back its version of the gossip state
|
|
||||||
3. it has conflicting gossip versions, in which case the different versions are merged
|
|
||||||
and sent back
|
|
||||||
|
|
||||||
If the recipient and the gossip have the same version then the gossip state is
|
|
||||||
not sent or requested.
|
|
||||||
|
|
||||||
The periodic nature of the gossip has a nice batching effect of state changes,
|
|
||||||
e.g. joining several nodes quickly after each other to one node will result in only
|
|
||||||
one state change to be spread to other members in the cluster.
|
|
||||||
|
|
||||||
The gossip messages are serialized with [protobuf](https://code.google.com/p/protobuf/) and also gzipped to reduce payload
|
|
||||||
size.
|
|
||||||
|
|
||||||
### Membership Lifecycle
|
|
||||||
|
|
||||||
A node begins in the `joining` state. Once all nodes have seen that the new
|
|
||||||
node is joining (through gossip convergence) the `leader` will set the member
|
|
||||||
state to `up`.
|
|
||||||
|
|
||||||
If a node is leaving the cluster in a safe, expected manner then it switches to
|
|
||||||
the `leaving` state. Once the leader sees the convergence on the node in the
|
|
||||||
`leaving` state, the leader will then move it to `exiting`. Once all nodes
|
|
||||||
have seen the exiting state (convergence) the `leader` will remove the node
|
|
||||||
from the cluster, marking it as `removed`.
|
|
||||||
|
|
||||||
If a node is `unreachable` then gossip convergence is not possible and therefore
|
|
||||||
any `leader` actions are also not possible (for instance, allowing a node to
|
|
||||||
become a part of the cluster). To be able to move forward the state of the
|
|
||||||
`unreachable` nodes must be changed. It must become `reachable` again or marked
|
|
||||||
as `down`. If the node is to join the cluster again the actor system must be
|
|
||||||
restarted and go through the joining process again. The cluster can, through the
|
|
||||||
leader, also *auto-down* a node after a configured time of unreachability. If new
|
|
||||||
incarnation of unreachable node tries to rejoin the cluster old incarnation will be
|
|
||||||
marked as `down` and new incarnation can rejoin the cluster without manual intervention.
|
|
||||||
|
|
||||||
@@@ note
|
|
||||||
|
|
||||||
If you have *auto-down* enabled and the failure detector triggers, you
|
|
||||||
can over time end up with a lot of single node clusters if you don't put
|
|
||||||
measures in place to shut down nodes that have become `unreachable`. This
|
|
||||||
follows from the fact that the `unreachable` node will likely see the rest of
|
|
||||||
the cluster as `unreachable`, become its own leader and form its own cluster.
|
|
||||||
|
|
||||||
@@@
|
|
||||||
|
|
||||||
As mentioned before, if a node is `unreachable` then gossip convergence is not
|
|
||||||
possible and therefore any `leader` actions are also not possible. By enabling
|
|
||||||
`akka.cluster.allow-weakly-up-members` (enabled by default) it is possible to
|
|
||||||
let new joining nodes be promoted while convergence is not yet reached. These
|
|
||||||
`Joining` nodes will be promoted as `WeaklyUp`. Once gossip convergence is
|
|
||||||
reached, the leader will move `WeaklyUp` members to `Up`.
|
|
||||||
|
|
||||||
Note that members on the other side of a network partition have no knowledge about
|
|
||||||
the existence of the new members. You should for example not count `WeaklyUp`
|
|
||||||
members in quorum decisions.
|
|
||||||
|
|
||||||
#### State Diagram for the Member States (`akka.cluster.allow-weakly-up-members=off`)
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
#### State Diagram for the Member States (`akka.cluster.allow-weakly-up-members=on`)
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
#### Member States
|
|
||||||
|
|
||||||
* **joining** - transient state when joining a cluster
|
|
||||||
|
|
||||||
* **weakly up** - transient state while network split (only if `akka.cluster.allow-weakly-up-members=on`)
|
|
||||||
|
|
||||||
* **up** - normal operating state
|
|
||||||
|
|
||||||
* **leaving** / **exiting** - states during graceful removal
|
|
||||||
|
|
||||||
* **down** - marked as down (no longer part of cluster decisions)
|
|
||||||
|
|
||||||
* **removed** - tombstone state (no longer a member)
|
|
||||||
|
|
||||||
|
|
||||||
#### User Actions
|
|
||||||
|
|
||||||
* **join** - join a single node to a cluster - can be explicit or automatic on
|
|
||||||
startup if a node to join have been specified in the configuration
|
|
||||||
|
|
||||||
* **leave** - tell a node to leave the cluster gracefully
|
|
||||||
|
|
||||||
* **down** - mark a node as down
|
|
||||||
|
|
||||||
|
|
||||||
#### Leader Actions
|
|
||||||
|
|
||||||
The `leader` has the following duties:
|
|
||||||
|
|
||||||
* shifting members in and out of the cluster
|
|
||||||
* joining -> up
|
|
||||||
* weakly up -> up *(no convergence is required for this leader action to be performed)*
|
|
||||||
* exiting -> removed
|
|
||||||
|
|
||||||
#### Failure Detection and Unreachability
|
|
||||||
|
|
||||||
* **fd*** - the failure detector of one of the monitoring nodes has triggered
|
|
||||||
causing the monitored node to be marked as unreachable
|
|
||||||
|
|
||||||
* **unreachable*** - unreachable is not a real member states but more of a flag in addition to the state signaling that the cluster is unable to talk to this node, after being unreachable the failure detector may detect it as reachable again and thereby remove the flag
|
|
||||||
|
|
||||||
|
|
@ -51,12 +51,12 @@ with a specific role. It communicates with other `Replicator` instances with the
|
||||||
actor using the `Replicator.props`. If it is started as an ordinary actor it is important
|
actor using the `Replicator.props`. If it is started as an ordinary actor it is important
|
||||||
that it is given the same name, started on same path, on all nodes.
|
that it is given the same name, started on same path, on all nodes.
|
||||||
|
|
||||||
Cluster members with status @ref:[WeaklyUp](cluster-usage.md#weakly-up),
|
Cluster members with status @ref:[WeaklyUp](typed/cluster-membership.md#weakly-up),
|
||||||
will participate in Distributed Data. This means that the data will be replicated to the
|
will participate in Distributed Data. This means that the data will be replicated to the
|
||||||
@ref:[WeaklyUp](cluster-usage.md#weakly-up) nodes with the background gossip protocol. Note that it
|
@ref:[WeaklyUp](typed/cluster-membership.md#weakly-up) nodes with the background gossip protocol. Note that it
|
||||||
will not participate in any actions where the consistency mode is to read/write from all
|
will not participate in any actions where the consistency mode is to read/write from all
|
||||||
nodes or the majority of nodes. The @ref:[WeaklyUp](cluster-usage.md#weakly-up) node is not counted
|
nodes or the majority of nodes. The @ref:[WeaklyUp](typed/cluster-membership.md#weakly-up) node is not counted
|
||||||
as part of the cluster. So 3 nodes + 5 @ref:[WeaklyUp](cluster-usage.md#weakly-up) is essentially a
|
as part of the cluster. So 3 nodes + 5 @ref:[WeaklyUp](typed/cluster-membership.md#weakly-up) is essentially a
|
||||||
3 node cluster as far as consistent actions are concerned.
|
3 node cluster as far as consistent actions are concerned.
|
||||||
|
|
||||||
Below is an example of an actor that schedules tick messages to itself and for each tick
|
Below is an example of an actor that schedules tick messages to itself and for each tick
|
||||||
|
|
|
||||||
|
|
@ -34,7 +34,7 @@ a few seconds. Changes are only performed in the own part of the registry and th
|
||||||
changes are versioned. Deltas are disseminated in a scalable way to other nodes with
|
changes are versioned. Deltas are disseminated in a scalable way to other nodes with
|
||||||
a gossip protocol.
|
a gossip protocol.
|
||||||
|
|
||||||
Cluster members with status @ref:[WeaklyUp](cluster-usage.md#weakly-up),
|
Cluster members with status @ref:[WeaklyUp](typed/cluster-membership.md#weakly-up),
|
||||||
will participate in Distributed Publish Subscribe, i.e. subscribers on nodes with
|
will participate in Distributed Publish Subscribe, i.e. subscribers on nodes with
|
||||||
`WeaklyUp` status will receive published messages if the publisher and subscriber are on
|
`WeaklyUp` status will receive published messages if the publisher and subscriber are on
|
||||||
same side of a network partition.
|
same side of a network partition.
|
||||||
|
|
|
||||||
50
akka-docs/src/main/paradox/includes/cluster.md
Normal file
50
akka-docs/src/main/paradox/includes/cluster.md
Normal file
|
|
@ -0,0 +1,50 @@
|
||||||
|
<!--- #cluster-singleton --->
|
||||||
|
### Cluster Singleton
|
||||||
|
|
||||||
|
For some use cases it is convenient or necessary to ensure only one
|
||||||
|
actor of a certain type is running somewhere in the cluster.
|
||||||
|
This can be implemented by subscribing to member events, but there are several corner
|
||||||
|
cases to consider. Therefore, this specific use case is covered by the Cluster Singleton.
|
||||||
|
|
||||||
|
<!--- #cluster-singleton --->
|
||||||
|
|
||||||
|
<!--- #cluster-sharding --->
|
||||||
|
### Cluster Sharding
|
||||||
|
|
||||||
|
Distributes actors across several nodes in the cluster and supports interaction
|
||||||
|
with the actors using their logical identifier, but without having to care about
|
||||||
|
their physical location in the cluster.
|
||||||
|
|
||||||
|
<!--- #cluster-sharding --->
|
||||||
|
|
||||||
|
<!--- #cluster-ddata --->
|
||||||
|
### Distributed Data
|
||||||
|
|
||||||
|
*Akka Distributed Data* is useful when you need to share data between nodes in an
|
||||||
|
Akka Cluster. The data is accessed with an actor providing a key-value store like API.
|
||||||
|
|
||||||
|
<!--- #cluster-ddata --->
|
||||||
|
|
||||||
|
<!--- #cluster-pubsub --->
|
||||||
|
### Distributed Publish Subscribe
|
||||||
|
|
||||||
|
Publish-subscribe messaging between actors in the cluster, and point-to-point messaging
|
||||||
|
using the logical path of the actors, i.e. the sender does not have to know on which
|
||||||
|
node the destination actor is running.
|
||||||
|
|
||||||
|
<!--- #cluster-pubsub --->
|
||||||
|
|
||||||
|
<!--- #cluster-multidc --->
|
||||||
|
### Cluster across multiple data centers
|
||||||
|
|
||||||
|
Akka Cluster can be used across multiple data centers, availability zones or regions,
|
||||||
|
so that one Cluster can span multiple data centers and still be tolerant to network partitions.
|
||||||
|
|
||||||
|
<!--- #cluster-multidc --->
|
||||||
|
|
||||||
|
<!--- #join-seeds-programmatic --->
|
||||||
|
You may also join programmatically, which is attractive when dynamically discovering other nodes
|
||||||
|
at startup by using some external tool or API. When joining to seed nodes you should not include
|
||||||
|
the node itself except for the node that is supposed to be the first seed node, which should be
|
||||||
|
placed first in the parameter to the programmatic join.
|
||||||
|
<!--- #join-seeds-programmatic --->
|
||||||
|
|
@ -7,7 +7,7 @@ For the new API see @ref[Cluster](typed/index-cluster.md).
|
||||||
|
|
||||||
@@@ index
|
@@@ index
|
||||||
|
|
||||||
* [cluster-usage](cluster-usage.md)
|
* [cluster-usage](cluster-usage.md)
|
||||||
* [cluster-routing](cluster-routing.md)
|
* [cluster-routing](cluster-routing.md)
|
||||||
* [cluster-singleton](cluster-singleton.md)
|
* [cluster-singleton](cluster-singleton.md)
|
||||||
* [distributed-pub-sub](distributed-pub-sub.md)
|
* [distributed-pub-sub](distributed-pub-sub.md)
|
||||||
|
|
|
||||||
|
|
@ -312,58 +312,26 @@ Watching a remote actor is not different than watching a local actor, as describ
|
||||||
|
|
||||||
### Failure Detector
|
### Failure Detector
|
||||||
|
|
||||||
Under the hood remote death watch uses heartbeat messages and a failure detector to generate `Terminated`
|
Please see:
|
||||||
message from network failures and JVM crashes, in addition to graceful termination of watched
|
|
||||||
actor.
|
|
||||||
|
|
||||||
The heartbeat arrival times is interpreted by an implementation of
|
* @ref:[Phi Accrual Failure Detector](typed/failure-detector.md) implementation for details
|
||||||
[The Phi Accrual Failure Detector](http://www.jaist.ac.jp/~defago/files/pdf/IS_RR_2004_010.pdf).
|
* [Using the Failure Detector](#using-the-failure-detector) below for usage
|
||||||
|
|
||||||
The suspicion level of failure is given by a value called *phi*.
|
### Using the Failure Detector
|
||||||
The basic idea of the phi failure detector is to express the value of *phi* on a scale that
|
|
||||||
is dynamically adjusted to reflect current network conditions.
|
Remoting uses the `akka.remote.PhiAccrualFailureDetector` failure detector by default, or you can provide your by
|
||||||
|
implementing the `akka.remote.FailureDetector` and configuring it:
|
||||||
The value of *phi* is calculated as:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
phi = -log10(1 - F(timeSinceLastHeartbeat))
|
akka.remote.watch-failure-detector.implementation-class = "com.example.CustomFailureDetector"
|
||||||
```
|
```
|
||||||
|
|
||||||
where F is the cumulative distribution function of a normal distribution with mean
|
In the [Remote Configuration](#remote-configuration) you may want to adjust these
|
||||||
and standard deviation estimated from historical heartbeat inter-arrival times.
|
depending on you environment:
|
||||||
|
|
||||||
In the [Remote Configuration](#remote-configuration) you can adjust the `akka.remote.watch-failure-detector.threshold`
|
|
||||||
to define when a *phi* value is considered to be a failure.
|
|
||||||
|
|
||||||
A low `threshold` is prone to generate many false positives but ensures
|
|
||||||
a quick detection in the event of a real crash. Conversely, a high `threshold`
|
|
||||||
generates fewer mistakes but needs more time to detect actual crashes. The
|
|
||||||
default `threshold` is 10 and is appropriate for most situations. However in
|
|
||||||
cloud environments, such as Amazon EC2, the value could be increased to 12 in
|
|
||||||
order to account for network issues that sometimes occur on such platforms.
|
|
||||||
|
|
||||||
The following chart illustrates how *phi* increase with increasing time since the
|
|
||||||
previous heartbeat.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
Phi is calculated from the mean and standard deviation of historical
|
|
||||||
inter arrival times. The previous chart is an example for standard deviation
|
|
||||||
of 200 ms. If the heartbeats arrive with less deviation the curve becomes steeper,
|
|
||||||
i.e. it is possible to determine failure more quickly. The curve looks like this for
|
|
||||||
a standard deviation of 100 ms.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
To be able to survive sudden abnormalities, such as garbage collection pauses and
|
|
||||||
transient network failures the failure detector is configured with a margin,
|
|
||||||
`akka.remote.watch-failure-detector.acceptable-heartbeat-pause`. You may want to
|
|
||||||
adjust the [Remote Configuration](#remote-configuration) of this depending on you environment.
|
|
||||||
This is how the curve looks like for `acceptable-heartbeat-pause` configured to
|
|
||||||
3 seconds.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|
* When a *phi* value is considered to be a failure `akka.remote.watch-failure-detector.threshold`
|
||||||
|
* Margin of error for sudden abnormalities `akka.remote.watch-failure-detector.acceptable-heartbeat-pause`
|
||||||
|
|
||||||
## Serialization
|
## Serialization
|
||||||
|
|
||||||
You need to enable @ref:[serialization](serialization.md) for your actor messages.
|
You need to enable @ref:[serialization](serialization.md) for your actor messages.
|
||||||
|
|
|
||||||
|
|
@ -41,7 +41,7 @@ implement manually.
|
||||||
@@@
|
@@@
|
||||||
|
|
||||||
@@@ warning { title=IMPORTANT }
|
@@@ warning { title=IMPORTANT }
|
||||||
Use stream refs with Akka Cluster. The failure detector can cause quarantining if plain Akka remoting is used.
|
Use stream refs with Akka Cluster. The [failure detector can cause quarantining](../typed/cluster-specification.md#quarantined) if plain Akka remoting is used.
|
||||||
@@@
|
@@@
|
||||||
|
|
||||||
## Stream References
|
## Stream References
|
||||||
|
|
|
||||||
73
akka-docs/src/main/paradox/typed/choosing-cluster.md
Normal file
73
akka-docs/src/main/paradox/typed/choosing-cluster.md
Normal file
|
|
@ -0,0 +1,73 @@
|
||||||
|
<a id="when-and-where-to-use-akka-cluster"></a>
|
||||||
|
# Choosing Akka Cluster
|
||||||
|
|
||||||
|
An architectural choice you have to make is if you are going to use a microservices architecture or
|
||||||
|
a traditional distributed application. This choice will influence how you should use Akka Cluster.
|
||||||
|
|
||||||
|
## Microservices
|
||||||
|
|
||||||
|
Microservices has many attractive properties, such as the independent nature of microservices allows for
|
||||||
|
multiple smaller and more focused teams that can deliver new functionality more frequently and can
|
||||||
|
respond quicker to business opportunities. Reactive Microservices should be isolated, autonomous, and have
|
||||||
|
a single responsibility as identified by Jonas Bonér in the book
|
||||||
|
[Reactive Microsystems: The Evolution of Microservices at Scale](https://info.lightbend.com/ebook-reactive-microservices-the-evolution-of-microservices-at-scale-register.html).
|
||||||
|
|
||||||
|
In a microservices architecture, you should consider communication within a service and between services.
|
||||||
|
|
||||||
|
In general we recommend against using Akka Cluster and actor messaging between _different_ services because that
|
||||||
|
would result in a too tight code coupling between the services and difficulties deploying these independent of
|
||||||
|
each other, which is one of the main reasons for using a microservices architecture.
|
||||||
|
See the discussion on
|
||||||
|
@scala[[Internal and External Communication](https://www.lagomframework.com/documentation/current/scala/InternalAndExternalCommunication.html)]
|
||||||
|
@java[[Internal and External Communication](https://www.lagomframework.com/documentation/current/java/InternalAndExternalCommunication.html)]
|
||||||
|
in the docs of the [Lagom Framework](https://www.lagomframework.com) (where each microservice is an Akka Cluster)
|
||||||
|
for some background on this.
|
||||||
|
|
||||||
|
Nodes of a single service (collectively called a cluster) require less decoupling. They share the same code and
|
||||||
|
are deployed together, as a set, by a single team or individual. There might be two versions running concurrently
|
||||||
|
during a rolling deployment, but deployment of the entire set has a single point of control. For this reason,
|
||||||
|
intra-service communication can take advantage of Akka Cluster, failure management and actor messaging, which
|
||||||
|
is convenient to use and has great performance.
|
||||||
|
|
||||||
|
Between different services [Akka HTTP](https://doc.akka.io/docs/akka-http/current) or
|
||||||
|
[Akka gRPC](https://doc.akka.io/docs/akka-grpc/current/) can be used for synchronous (yet non-blocking)
|
||||||
|
communication and [Akka Streams Kafka](https://doc.akka.io/docs/akka-stream-kafka/current/home.html) or other
|
||||||
|
[Alpakka](https://doc.akka.io/docs/alpakka/current/) connectors for integration asynchronous communication.
|
||||||
|
All those communication mechanisms work well with streaming of messages with end-to-end back-pressure, and the
|
||||||
|
synchronous communication tools can also be used for single request response interactions. It is also important
|
||||||
|
to note that when using these tools both sides of the communication do not have to be implemented with Akka,
|
||||||
|
nor does the programming language matter.
|
||||||
|
|
||||||
|
## Traditional distributed application
|
||||||
|
|
||||||
|
We acknowledge that microservices also introduce many new challenges and it's not the only way to
|
||||||
|
build applications. A traditional distributed application may have less complexity and work well in many cases.
|
||||||
|
For example for a small startup, with a single team, building an application where time to market is everything.
|
||||||
|
Akka Cluster can efficiently be used for building such distributed application.
|
||||||
|
|
||||||
|
In this case, you have a single deployment unit, built from a single code base (or using traditional binary
|
||||||
|
dependency management to modularize) but deployed across many nodes using a single cluster.
|
||||||
|
Tighter coupling is OK, because there is a central point of deployment and control. In some cases, nodes may
|
||||||
|
have specialized runtime roles which means that the cluster is not totally homogenous (e.g., "front-end" and
|
||||||
|
"back-end" nodes, or dedicated master/worker nodes) but if these are run from the same built artifacts this
|
||||||
|
is just a runtime behavior and doesn't cause the same kind of problems you might get from tight coupling of
|
||||||
|
totally separate artifacts.
|
||||||
|
|
||||||
|
A tightly coupled distributed application has served the industry and many Akka users well for years and is
|
||||||
|
still a valid choice.
|
||||||
|
|
||||||
|
## Distributed monolith
|
||||||
|
|
||||||
|
There is also an anti-pattern that is sometimes called "distributed monolith". You have multiple services
|
||||||
|
that are built and deployed independently from each other, but they have a tight coupling that makes this
|
||||||
|
very risky, such as a shared cluster, shared code and dependencies for service API calls, or a shared
|
||||||
|
database schema. There is a false sense of autonomy because of the physical separation of the code and
|
||||||
|
deployment units, but you are likely to encounter problems because of changes in the implementation of
|
||||||
|
one service leaking into the behavior of others. See Ben Christensen's
|
||||||
|
[Don’t Build a Distributed Monolith](https://www.microservices.com/talks/dont-build-a-distributed-monolith/).
|
||||||
|
|
||||||
|
Organizations that find themselves in this situation often react by trying to centrally coordinate deployment
|
||||||
|
of multiple services, at which point you have lost the principal benefit of microservices while taking on
|
||||||
|
the costs. You are in a halfway state with things that aren't really separable being built and deployed
|
||||||
|
in a separate way. Some people do this, and some manage to make it work, but it's not something we would
|
||||||
|
recommend and it needs to be carefully managed.
|
||||||
158
akka-docs/src/main/paradox/typed/cluster-membership.md
Normal file
158
akka-docs/src/main/paradox/typed/cluster-membership.md
Normal file
|
|
@ -0,0 +1,158 @@
|
||||||
|
# Cluster Membership Service
|
||||||
|
|
||||||
|
The core of Akka Cluster is the cluster membership, to keep track of what nodes are part of the cluster and
|
||||||
|
their health. Cluster membership is communicated using @ref:[gossip](cluster-specification.md#gossip) and
|
||||||
|
@ref:[failure detection](cluster-specification.md#failure-detector).
|
||||||
|
|
||||||
|
There are several @ref:[Higher level Cluster tools](../typed/cluster.md#higher-level-cluster-tools) that are built
|
||||||
|
on top of the cluster membership service.
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
A cluster is made up of a set of member nodes. The identifier for each node is a
|
||||||
|
`hostname:port:uid` tuple. An Akka application can be distributed over a cluster with
|
||||||
|
each node hosting some part of the application. Cluster membership and the actors running
|
||||||
|
on that node of the application are decoupled. A node could be a member of a
|
||||||
|
cluster without hosting any actors. Joining a cluster is initiated
|
||||||
|
by issuing a `Join` command to one of the nodes in the cluster to join.
|
||||||
|
|
||||||
|
The node identifier internally also contains a UID that uniquely identifies this
|
||||||
|
actor system instance at that `hostname:port`. Akka uses the UID to be able to
|
||||||
|
reliably trigger remote death watch. This means that the same actor system can never
|
||||||
|
join a cluster again once it's been removed from that cluster. To re-join an actor
|
||||||
|
system with the same `hostname:port` to a cluster you have to stop the actor system
|
||||||
|
and start a new one with the same `hostname:port` which will then receive a different
|
||||||
|
UID.
|
||||||
|
|
||||||
|
## Member States
|
||||||
|
|
||||||
|
The cluster membership state is a specialized [CRDT](http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf), which means that it has a monotonic
|
||||||
|
merge function. When concurrent changes occur on different nodes the updates can always be
|
||||||
|
merged and converge to the same end result.
|
||||||
|
|
||||||
|
* **joining** - transient state when joining a cluster
|
||||||
|
|
||||||
|
* **weakly up** - transient state while network split (only if `akka.cluster.allow-weakly-up-members=on`)
|
||||||
|
|
||||||
|
* **up** - normal operating state
|
||||||
|
|
||||||
|
* **leaving** / **exiting** - states during graceful removal
|
||||||
|
|
||||||
|
* **down** - marked as down (no longer part of cluster decisions)
|
||||||
|
|
||||||
|
* **removed** - tombstone state (no longer a member)
|
||||||
|
|
||||||
|
## Member Events
|
||||||
|
|
||||||
|
The events to track the life-cycle of members are:
|
||||||
|
|
||||||
|
* `ClusterEvent.MemberJoined` - A new member has joined the cluster and its status has been changed to `Joining`
|
||||||
|
* `ClusterEvent.MemberUp` - A new member has joined the cluster and its status has been changed to `Up`
|
||||||
|
* `ClusterEvent.MemberExited` - A member is leaving the cluster and its status has been changed to `Exiting`
|
||||||
|
Note that the node might already have been shutdown when this event is published on another node.
|
||||||
|
* `ClusterEvent.MemberRemoved` - Member completely removed from the cluster.
|
||||||
|
* `ClusterEvent.UnreachableMember` - A member is considered as unreachable, detected by the failure detector
|
||||||
|
of at least one other node.
|
||||||
|
* `ClusterEvent.ReachableMember` - A member is considered as reachable again, after having been unreachable.
|
||||||
|
All nodes that previously detected it as unreachable has detected it as reachable again.
|
||||||
|
|
||||||
|
## Membership Lifecycle
|
||||||
|
|
||||||
|
A node begins in the `joining` state. Once all nodes have seen that the new
|
||||||
|
node is joining (through gossip convergence) the `leader` will set the member
|
||||||
|
state to `up`.
|
||||||
|
|
||||||
|
If a node is leaving the cluster in a safe, expected manner then it switches to
|
||||||
|
the `leaving` state. Once the leader sees the convergence on the node in the
|
||||||
|
`leaving` state, the leader will then move it to `exiting`. Once all nodes
|
||||||
|
have seen the exiting state (convergence) the `leader` will remove the node
|
||||||
|
from the cluster, marking it as `removed`.
|
||||||
|
|
||||||
|
If a node is `unreachable` then gossip convergence is not possible and therefore
|
||||||
|
any `leader` actions are also not possible (for instance, allowing a node to
|
||||||
|
become a part of the cluster). To be able to move forward the state of the
|
||||||
|
`unreachable` nodes must be changed. It must become `reachable` again or marked
|
||||||
|
as `down`. If the node is to join the cluster again the actor system must be
|
||||||
|
restarted and go through the joining process again. The cluster can, through the
|
||||||
|
leader, also *auto-down* a node after a configured time of unreachability. If new
|
||||||
|
incarnation of unreachable node tries to rejoin the cluster old incarnation will be
|
||||||
|
marked as `down` and new incarnation can rejoin the cluster without manual intervention.
|
||||||
|
|
||||||
|
@@@ note
|
||||||
|
|
||||||
|
If you have *auto-down* enabled and the failure detector triggers, you
|
||||||
|
can over time end up with a lot of single node clusters if you don't put
|
||||||
|
measures in place to shut down nodes that have become `unreachable`. This
|
||||||
|
follows from the fact that the `unreachable` node will likely see the rest of
|
||||||
|
the cluster as `unreachable`, become its own leader and form its own cluster.
|
||||||
|
|
||||||
|
@@@
|
||||||
|
|
||||||
|
<a id="weakly-up"></a>
|
||||||
|
## WeaklyUp Members
|
||||||
|
|
||||||
|
If a node is `unreachable` then gossip convergence is not possible and therefore any
|
||||||
|
`leader` actions are also not possible. However, we still might want new nodes to join
|
||||||
|
the cluster in this scenario.
|
||||||
|
|
||||||
|
`Joining` members will be promoted to `WeaklyUp` and become part of the cluster if
|
||||||
|
convergence can't be reached. Once gossip convergence is reached, the leader will move `WeaklyUp`
|
||||||
|
members to `Up`.
|
||||||
|
|
||||||
|
This feature is enabled by default, but it can be disabled with configuration option:
|
||||||
|
|
||||||
|
```
|
||||||
|
akka.cluster.allow-weakly-up-members = off
|
||||||
|
```
|
||||||
|
|
||||||
|
You can subscribe to the `WeaklyUp` membership event to make use of the members that are
|
||||||
|
in this state, but you should be aware of that members on the other side of a network partition
|
||||||
|
have no knowledge about the existence of the new members. You should for example not count
|
||||||
|
`WeaklyUp` members in quorum decisions.
|
||||||
|
|
||||||
|
As mentioned before, if a node is `unreachable` then gossip convergence is not
|
||||||
|
possible and therefore any `leader` actions are also not possible. By enabling
|
||||||
|
`akka.cluster.allow-weakly-up-members` (enabled by default) it is possible to
|
||||||
|
let new joining nodes be promoted while convergence is not yet reached. These
|
||||||
|
`Joining` nodes will be promoted as `WeaklyUp`. Once gossip convergence is
|
||||||
|
reached, the leader will move `WeaklyUp` members to `Up`.
|
||||||
|
|
||||||
|
Note that members on the other side of a network partition have no knowledge about
|
||||||
|
the existence of the new members. You should for example not count `WeaklyUp`
|
||||||
|
members in quorum decisions.
|
||||||
|
|
||||||
|
## State Diagrams
|
||||||
|
|
||||||
|
### State Diagram for the Member States (`akka.cluster.allow-weakly-up-members=off`)
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
### State Diagram for the Member States (`akka.cluster.allow-weakly-up-members=on`)
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
#### User Actions
|
||||||
|
|
||||||
|
* **join** - join a single node to a cluster - can be explicit or automatic on
|
||||||
|
startup if a node to join have been specified in the configuration
|
||||||
|
|
||||||
|
* **leave** - tell a node to leave the cluster gracefully
|
||||||
|
|
||||||
|
* **down** - mark a node as down
|
||||||
|
|
||||||
|
#### Leader Actions
|
||||||
|
|
||||||
|
The `leader` has the following duties:
|
||||||
|
|
||||||
|
* shifting members in and out of the cluster
|
||||||
|
* joining -> up
|
||||||
|
* weakly up -> up *(no convergence is required for this leader action to be performed)*
|
||||||
|
* exiting -> removed
|
||||||
|
|
||||||
|
#### Failure Detection and Unreachability
|
||||||
|
|
||||||
|
* **fd*** - the failure detector of one of the monitoring nodes has triggered
|
||||||
|
causing the monitored node to be marked as unreachable
|
||||||
|
|
||||||
|
* **unreachable*** - unreachable is not a real member states but more of a flag in addition to the state signaling that the cluster is unable to talk to this node, after being unreachable the failure detector may detect it as reachable again and thereby remove the flag
|
||||||
|
|
||||||
184
akka-docs/src/main/paradox/typed/cluster-specification.md
Normal file
184
akka-docs/src/main/paradox/typed/cluster-specification.md
Normal file
|
|
@ -0,0 +1,184 @@
|
||||||
|
# Cluster Specification
|
||||||
|
|
||||||
|
This document describes the design concepts of Akka Cluster. For the guide on using Akka Cluster please see either
|
||||||
|
|
||||||
|
* @ref:[Cluster Usage](../typed/cluster.md)
|
||||||
|
* @ref:[Cluster Usage with classic Akka APIs](../cluster-usage.md)
|
||||||
|
* @ref:[Cluster Membership Service](cluster-membership.md)
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
Akka Cluster provides a fault-tolerant decentralized peer-to-peer based
|
||||||
|
@ref:[Cluster Membership Service](cluster-membership.md#cluster-membership-service) with no single point of failure or
|
||||||
|
single point of bottleneck. It does this using [gossip](#gossip) protocols and an automatic [failure detector](#failure-detector).
|
||||||
|
|
||||||
|
Akka Cluster allows for building distributed applications, where one application or service spans multiple nodes
|
||||||
|
(in practice multiple `ActorSystem`s).
|
||||||
|
|
||||||
|
## Terms
|
||||||
|
|
||||||
|
**node**
|
||||||
|
: A logical member of a cluster. There could be multiple nodes on a physical
|
||||||
|
machine. Defined by a *hostname:port:uid* tuple.
|
||||||
|
|
||||||
|
**cluster**
|
||||||
|
: A set of nodes joined together through the @ref:[Cluster Membership Service](cluster-membership.md#cluster-membership-service).
|
||||||
|
|
||||||
|
**leader**
|
||||||
|
: A single node in the cluster that acts as the leader. Managing cluster convergence
|
||||||
|
and membership state transitions.
|
||||||
|
|
||||||
|
### Gossip
|
||||||
|
|
||||||
|
The cluster membership used in Akka is based on Amazon's [Dynamo](http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) system and
|
||||||
|
particularly the approach taken in Basho's' [Riak](http://basho.com/technology/architecture/) distributed database.
|
||||||
|
Cluster membership is communicated using a [Gossip Protocol](http://en.wikipedia.org/wiki/Gossip_protocol), where the current
|
||||||
|
state of the cluster is gossiped randomly through the cluster, with preference to
|
||||||
|
members that have not seen the latest version.
|
||||||
|
|
||||||
|
#### Vector Clocks
|
||||||
|
|
||||||
|
[Vector clocks](http://en.wikipedia.org/wiki/Vector_clock) are a type of data structure and algorithm for generating a partial
|
||||||
|
ordering of events in a distributed system and detecting causality violations.
|
||||||
|
|
||||||
|
We use vector clocks to reconcile and merge differences in cluster state
|
||||||
|
during gossiping. A vector clock is a set of (node, counter) pairs. Each update
|
||||||
|
to the cluster state has an accompanying update to the vector clock.
|
||||||
|
|
||||||
|
#### Gossip Convergence
|
||||||
|
|
||||||
|
Information about the cluster converges locally at a node at certain points in time.
|
||||||
|
This is when a node can prove that the cluster state he is observing has been observed
|
||||||
|
by all other nodes in the cluster. Convergence is implemented by passing a set of nodes
|
||||||
|
that have seen current state version during gossip. This information is referred to as the
|
||||||
|
seen set in the gossip overview. When all nodes are included in the seen set there is
|
||||||
|
convergence.
|
||||||
|
|
||||||
|
Gossip convergence cannot occur while any nodes are `unreachable`. The nodes need
|
||||||
|
to become `reachable` again, or moved to the `down` and `removed` states
|
||||||
|
(see the @ref:[Cluster Membership Lifecycle](cluster-membership.md#membership-lifecycle) section). This only blocks the leader
|
||||||
|
from performing its cluster membership management and does not influence the application
|
||||||
|
running on top of the cluster. For example this means that during a network partition
|
||||||
|
it is not possible to add more nodes to the cluster. The nodes can join, but they
|
||||||
|
will not be moved to the `up` state until the partition has healed or the unreachable
|
||||||
|
nodes have been downed.
|
||||||
|
|
||||||
|
#### Failure Detector
|
||||||
|
|
||||||
|
The failure detector in Akka Cluster is responsible for trying to detect if a node is
|
||||||
|
`unreachable` from the rest of the cluster. For this we are using the
|
||||||
|
@ref:[Phi Accrual Failure Detector](failure-detector.md) implementation.
|
||||||
|
To be able to survive sudden abnormalities, such as garbage collection pauses and
|
||||||
|
transient network failures the failure detector is easily @ref:[configurable](cluster.md#using-the-failure-detector)
|
||||||
|
for tuning to your environments and needs.
|
||||||
|
|
||||||
|
In a cluster each node is monitored by a few (default maximum 5) other nodes.
|
||||||
|
The nodes to monitor are selected from neighbors in a hashed ordered node ring.
|
||||||
|
This is to increase the likelihood to monitor across racks and data centers, but the order
|
||||||
|
is the same on all nodes, which ensures full coverage.
|
||||||
|
|
||||||
|
When any node is detected to be `unreachable` this data is spread to
|
||||||
|
the rest of the cluster through the @ref:[gossip](#gossip). In other words, only one node needs to
|
||||||
|
mark a node `unreachable` to have the rest of the cluster mark that node `unreachable`.
|
||||||
|
|
||||||
|
The failure detector will also detect if the node becomes `reachable` again. When
|
||||||
|
all nodes that monitored the `unreachable` node detect it as `reachable` again
|
||||||
|
the cluster, after gossip dissemination, will consider it as `reachable`.
|
||||||
|
|
||||||
|
<a id="quarantined"></a>
|
||||||
|
If system messages cannot be delivered to a node it will be quarantined and then it
|
||||||
|
cannot come back from `unreachable`. This can happen if the there are too many
|
||||||
|
unacknowledged system messages (e.g. watch, Terminated, remote actor deployment,
|
||||||
|
failures of actors supervised by remote parent). Then the node needs to be moved
|
||||||
|
to the `down` or `removed` states (see @ref:[Cluster Membership Lifecycle](cluster-membership.md#membership-lifecycle))
|
||||||
|
and the actor system of the quarantined node must be restarted before it can join the cluster again.
|
||||||
|
|
||||||
|
See the following for more details:
|
||||||
|
|
||||||
|
* @ref:[Phi Accrual Failure Detector](failure-detector.md) implementation
|
||||||
|
* @ref:[Using the Failure Detector](cluster.md#using-the-failure-detector)
|
||||||
|
|
||||||
|
#### Leader
|
||||||
|
|
||||||
|
After gossip convergence a `leader` for the cluster can be determined. There is no
|
||||||
|
`leader` election process, the `leader` can always be recognised deterministically
|
||||||
|
by any node whenever there is gossip convergence. The leader is only a role, any node
|
||||||
|
can be the leader and it can change between convergence rounds.
|
||||||
|
The `leader` is the first node in sorted order that is able to take the leadership role,
|
||||||
|
where the preferred member states for a `leader` are `up` and `leaving`
|
||||||
|
(see the @ref:[Cluster Membership Lifecycle](cluster-membership.md#membership-lifecycle) for more information about member states).
|
||||||
|
|
||||||
|
The role of the `leader` is to shift members in and out of the cluster, changing
|
||||||
|
`joining` members to the `up` state or `exiting` members to the `removed`
|
||||||
|
state. Currently `leader` actions are only triggered by receiving a new cluster
|
||||||
|
state with gossip convergence.
|
||||||
|
|
||||||
|
The `leader` also has the power, if configured so, to "auto-down" a node that
|
||||||
|
according to the @ref:[Failure Detector](#failure-detector) is considered `unreachable`. This means setting
|
||||||
|
the `unreachable` node status to `down` automatically after a configured time
|
||||||
|
of unreachability.
|
||||||
|
|
||||||
|
#### Seed Nodes
|
||||||
|
|
||||||
|
The seed nodes are contact points for new nodes joining the cluster.
|
||||||
|
When a new node is started it sends a message to all seed nodes and then sends
|
||||||
|
a join command to the seed node that answers first.
|
||||||
|
|
||||||
|
The seed nodes configuration value does not have any influence on the running
|
||||||
|
cluster itself, it is only relevant for new nodes joining the cluster as it
|
||||||
|
helps them to find contact points to send the join command to; a new member
|
||||||
|
can send this command to any current member of the cluster, not only to the seed nodes.
|
||||||
|
|
||||||
|
#### Gossip Protocol
|
||||||
|
|
||||||
|
A variation of *push-pull gossip* is used to reduce the amount of gossip
|
||||||
|
information sent around the cluster. In push-pull gossip a digest is sent
|
||||||
|
representing current versions but not actual values; the recipient of the gossip
|
||||||
|
can then send back any values for which it has newer versions and also request
|
||||||
|
values for which it has outdated versions. Akka uses a single shared state with
|
||||||
|
a vector clock for versioning, so the variant of push-pull gossip used in Akka
|
||||||
|
makes use of this version to only push the actual state as needed.
|
||||||
|
|
||||||
|
Periodically, the default is every 1 second, each node chooses another random
|
||||||
|
node to initiate a round of gossip with. If less than ½ of the nodes resides in the
|
||||||
|
seen set (have seen the new state) then the cluster gossips 3 times instead of once
|
||||||
|
every second. This adjusted gossip interval is a way to speed up the convergence process
|
||||||
|
in the early dissemination phase after a state change.
|
||||||
|
|
||||||
|
The choice of node to gossip with is random but biased towards nodes that might not have seen
|
||||||
|
the current state version. During each round of gossip exchange, when convergence is not yet reached, a node
|
||||||
|
uses a very high probability (which is configurable) to gossip with another node which is not part of the seen set, i.e.
|
||||||
|
which is likely to have an older version of the state. Otherwise it gossips with any random live node.
|
||||||
|
|
||||||
|
This biased selection is a way to speed up the convergence process in the late dissemination
|
||||||
|
phase after a state change.
|
||||||
|
|
||||||
|
For clusters larger than 400 nodes (configurable, and suggested by empirical evidence)
|
||||||
|
the 0.8 probability is gradually reduced to avoid overwhelming single stragglers with
|
||||||
|
too many concurrent gossip requests. The gossip receiver also has a mechanism to
|
||||||
|
protect itself from too many simultaneous gossip messages by dropping messages that
|
||||||
|
have been enqueued in the mailbox for too long of a time.
|
||||||
|
|
||||||
|
While the cluster is in a converged state the gossiper only sends a small gossip status message containing the gossip
|
||||||
|
version to the chosen node. As soon as there is a change to the cluster (meaning non-convergence)
|
||||||
|
then it goes back to biased gossip again.
|
||||||
|
|
||||||
|
The recipient of the gossip state or the gossip status can use the gossip version
|
||||||
|
(vector clock) to determine whether:
|
||||||
|
|
||||||
|
1. it has a newer version of the gossip state, in which case it sends that back
|
||||||
|
to the gossiper
|
||||||
|
2. it has an outdated version of the state, in which case the recipient requests
|
||||||
|
the current state from the gossiper by sending back its version of the gossip state
|
||||||
|
3. it has conflicting gossip versions, in which case the different versions are merged
|
||||||
|
and sent back
|
||||||
|
|
||||||
|
If the recipient and the gossip have the same version then the gossip state is
|
||||||
|
not sent or requested.
|
||||||
|
|
||||||
|
The periodic nature of the gossip has a nice batching effect of state changes,
|
||||||
|
e.g. joining several nodes quickly after each other to one node will result in only
|
||||||
|
one state change to be spread to other members in the cluster.
|
||||||
|
|
||||||
|
The gossip messages are serialized with [protobuf](https://code.google.com/p/protobuf/) and also gzipped to reduce payload
|
||||||
|
size.
|
||||||
|
|
@ -1,8 +1,18 @@
|
||||||
# Cluster Membership
|
# Cluster Usage
|
||||||
|
|
||||||
|
This document describes how to use Akka Cluster and the Cluster APIs.
|
||||||
|
For specific documentation topics see:
|
||||||
|
|
||||||
|
* @ref:[Cluster Specification](cluster-specification.md)
|
||||||
|
* @ref:[Cluster Membership Service](cluster-membership.md)
|
||||||
|
* @ref:[When and where to use Akka Cluster](choosing-cluster.md)
|
||||||
|
* @ref:[Higher level Cluster tools](#higher-level-cluster-tools)
|
||||||
|
* @ref:[Rolling Updates](../additional/rolling-updates.md)
|
||||||
|
* @ref:[Operating, Managing, Observability](../additional/operations.md)
|
||||||
|
|
||||||
## Dependency
|
## Dependency
|
||||||
|
|
||||||
To use Akka Cluster Typed, you must add the following dependency in your project:
|
To use Akka Cluster add the following dependency in your project:
|
||||||
|
|
||||||
@@dependency[sbt,Maven,Gradle] {
|
@@dependency[sbt,Maven,Gradle] {
|
||||||
group=com.typesafe.akka
|
group=com.typesafe.akka
|
||||||
|
|
@ -10,16 +20,17 @@ To use Akka Cluster Typed, you must add the following dependency in your project
|
||||||
version=$akka.version$
|
version=$akka.version$
|
||||||
}
|
}
|
||||||
|
|
||||||
## Introduction
|
## Cluster API Extension
|
||||||
|
|
||||||
For an introduction to Akka Cluster concepts see @ref:[Cluster Specification](../common/cluster.md). This documentation shows how to use the typed
|
The Cluster extension gives you access to management tasks such as @ref:[Joining, Leaving and Downing](cluster-membership.md#user-actions)
|
||||||
Cluster API.
|
and subscription of cluster membership events such as @ref:[MemberUp, MemberRemoved and UnreachableMember](cluster-membership.md#membership-lifecycle),
|
||||||
|
which are exposed as event APIs.
|
||||||
|
|
||||||
You need to enable @ref:[serialization](../serialization.md) for your actor messages.
|
It does this through these references are on the `Cluster` extension:
|
||||||
@ref:[Serialization with Jackson](../serialization-jackson.md) is a good choice in many cases and our
|
|
||||||
recommendation if you don't have other preference.
|
|
||||||
|
|
||||||
## Examples
|
* manager: An @scala[`ActorRef[akka.cluster.typed.ClusterCommand]`]@java[`ActorRef<akka.cluster.typed.ClusterCommand>`] where a `ClusterCommand` is a command such as: `Join`, `Leave` and `Down`
|
||||||
|
* subscriptions: An @scala[`ActorRef[akka.cluster.typed.ClusterStateSubscription]`]@java[`ActorRef<akka.cluster.typed.ClusterStateSubscription>`] where a `ClusterStateSubscription` is one of `GetCurrentState` or `Subscribe` and `Unsubscribe` to cluster events like `MemberRemoved`
|
||||||
|
* state: The current `CurrentClusterState`
|
||||||
|
|
||||||
All of the examples below assume the following imports:
|
All of the examples below assume the following imports:
|
||||||
|
|
||||||
|
|
@ -29,18 +40,12 @@ Scala
|
||||||
Java
|
Java
|
||||||
: @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-imports }
|
: @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-imports }
|
||||||
|
|
||||||
|
<a id="basic-cluster-configuration"></a>
|
||||||
And the minimum configuration required is to set a host/port for remoting and the `akka.actor.provider = "cluster"`.
|
And the minimum configuration required is to set a host/port for remoting and the `akka.actor.provider = "cluster"`.
|
||||||
|
|
||||||
@@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #config-seeds }
|
@@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #config-seeds }
|
||||||
|
|
||||||
|
Starting the cluster on each node:
|
||||||
## Cluster API extension
|
|
||||||
|
|
||||||
The typed Cluster extension gives access to management tasks (Joining, Leaving, Downing, …) and subscription of
|
|
||||||
cluster membership events (MemberUp, MemberRemoved, UnreachableMember, etc). Those are exposed as two different actor
|
|
||||||
references, i.e. it’s a message based API.
|
|
||||||
|
|
||||||
The references are on the `Cluster` extension:
|
|
||||||
|
|
||||||
Scala
|
Scala
|
||||||
: @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #cluster-create }
|
: @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #cluster-create }
|
||||||
|
|
@ -48,16 +53,15 @@ Scala
|
||||||
Java
|
Java
|
||||||
: @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-create }
|
: @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-create }
|
||||||
|
|
||||||
The Cluster extensions gives you access to:
|
@@@ note
|
||||||
|
|
||||||
|
The name of the cluster's `ActorSystem` must be the same for all members, which is passed in when you start the `ActorSystem`.
|
||||||
|
|
||||||
* manager: An @scala[`ActorRef[ClusterCommand]`]@java[`ActorRef<ClusterCommand>`] where a `ClusterCommand` is a command such as: `Join`, `Leave` and `Down`
|
@@@
|
||||||
* subscriptions: An `ActorRef[ClusterStateSubscription]` where a `ClusterStateSubscription` is one of `GetCurrentState` or `Subscribe` and `Unsubscribe` to cluster events like `MemberRemoved`
|
|
||||||
* state: The current `CurrentClusterState`
|
|
||||||
|
|
||||||
|
### Joining and Leaving a Cluster
|
||||||
|
|
||||||
### Cluster Management
|
If not using configuration to specify [seed nodes to join](#joining-to-seed-nodes), joining the cluster can be done programmatically via the `manager`.
|
||||||
|
|
||||||
If not using configuration to specify seeds joining the cluster can be done programmatically via the `manager`.
|
|
||||||
|
|
||||||
Scala
|
Scala
|
||||||
: @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #cluster-join }
|
: @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #cluster-join }
|
||||||
|
|
@ -65,7 +69,7 @@ Scala
|
||||||
Java
|
Java
|
||||||
: @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-join }
|
: @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-join }
|
||||||
|
|
||||||
Leaving and downing are similar e.g.
|
[Leaving](#leaving) the cluster and [downing](#downing) a node are similar:
|
||||||
|
|
||||||
Scala
|
Scala
|
||||||
: @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #cluster-leave }
|
: @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #cluster-leave }
|
||||||
|
|
@ -73,13 +77,13 @@ Scala
|
||||||
Java
|
Java
|
||||||
: @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-leave }
|
: @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-leave }
|
||||||
|
|
||||||
### Cluster subscriptions
|
### Cluster Subscriptions
|
||||||
|
|
||||||
Cluster `subscriptions` can be used to receive messages when cluster state changes. For example, registering
|
Cluster `subscriptions` can be used to receive messages when cluster state changes. For example, registering
|
||||||
for all `MemberEvent`s, then using the `manager` to have a node leave the cluster will result in events
|
for all `MemberEvent`s, then using the `manager` to have a node leave the cluster will result in events
|
||||||
for the node going through the lifecycle described in @ref:[Cluster Specification](../common/cluster.md).
|
for the node going through the @ref:[Membership Lifecycle](cluster-membership.md#membership-lifecycle).
|
||||||
|
|
||||||
This example subscribes with a @scala[`subscriber: ActorRef[MemberEvent]`]@java[`ActorRef<MemberEvent> subscriber`]:
|
This example subscribes to a @scala[`subscriber: ActorRef[MemberEvent]`]@java[`ActorRef<MemberEvent> subscriber`]:
|
||||||
|
|
||||||
Scala
|
Scala
|
||||||
: @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #cluster-subscribe }
|
: @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #cluster-subscribe }
|
||||||
|
|
@ -95,15 +99,328 @@ Scala
|
||||||
Java
|
Java
|
||||||
: @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-leave-example }
|
: @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-leave-example }
|
||||||
|
|
||||||
## Serialization
|
|
||||||
|
|
||||||
See [serialization](https://doc.akka.io/docs/akka/current/scala/serialization.html) for how messages are sent between
|
### Cluster State
|
||||||
ActorSystems. Actor references are typically included in the messages,
|
|
||||||
since there is no `sender`. To serialize actor references to/from string representation you will use the `ActorRefResolver`.
|
Instead of subscribing to cluster events it can sometimes be convenient to only get the full membership state with
|
||||||
For example here's how a serializer could look for the `Ping` and `Pong` messages above:
|
@scala[`Cluster(system).state`]@java[`Cluster.get(system).state()`]. Note that this state is not necessarily in sync with the events published to a
|
||||||
|
cluster subscription.
|
||||||
|
|
||||||
|
See [Cluster Membership](cluster-membership.md#member-events) more information on member events specifically.
|
||||||
|
There are more types of change events, consult the API documentation
|
||||||
|
of classes that extends `akka.cluster.ClusterEvent.ClusterDomainEvent` for details about the events.
|
||||||
|
|
||||||
|
## Cluster Membership API
|
||||||
|
|
||||||
|
The `akka.cluster.seed-nodes` are initial contact points for [automatically](#joining-automatically-to-seed-nodes-with-cluster-bootstrap)
|
||||||
|
or [manually](#joining-configured-seed-nodes) joining a cluster.
|
||||||
|
|
||||||
|
After the joining process the seed nodes are not special and they participate in the cluster in exactly the same
|
||||||
|
way as other nodes.
|
||||||
|
|
||||||
|
### Joining configured seed nodes
|
||||||
|
|
||||||
|
When a new node is started it sends a message to all seed nodes and then sends a join command to the one that
|
||||||
|
answers first. If no one of the seed nodes replied (might not be started yet)
|
||||||
|
it retries this procedure until successful or shutdown.
|
||||||
|
|
||||||
|
You define the seed nodes in the [configuration](#configuration) file (application.conf):
|
||||||
|
|
||||||
|
This can also be defined as Java system properties when starting the JVM using the following syntax:
|
||||||
|
|
||||||
|
```
|
||||||
|
-Dakka.cluster.seed-nodes.0=akka://ClusterSystem@host1:2552
|
||||||
|
-Dakka.cluster.seed-nodes.1=akka://ClusterSystem@host2:2552
|
||||||
|
```
|
||||||
|
|
||||||
|
The seed nodes can be started in any order and it is not necessary to have all
|
||||||
|
seed nodes running, but the node configured as the **first element** in the `seed-nodes`
|
||||||
|
configuration list must be started when initially starting a cluster, otherwise the
|
||||||
|
other seed-nodes will not become initialized and no other node can join the cluster.
|
||||||
|
The reason for the special first seed node is to avoid forming separated islands when
|
||||||
|
starting from an empty cluster.
|
||||||
|
It is quickest to start all configured seed nodes at the same time (order doesn't matter),
|
||||||
|
otherwise it can take up to the configured `seed-node-timeout` until the nodes
|
||||||
|
can join.
|
||||||
|
|
||||||
|
Once more than two seed nodes have been started it is no problem to shut down the first
|
||||||
|
seed node. If the first seed node is restarted, it will first try to join the other
|
||||||
|
seed nodes in the existing cluster. Note that if you stop all seed nodes at the same time
|
||||||
|
and restart them with the same `seed-nodes` configuration they will join themselves and
|
||||||
|
form a new cluster instead of joining remaining nodes of the existing cluster. That is
|
||||||
|
likely not desired and should be avoided by listing several nodes as seed nodes for redundancy
|
||||||
|
and don't stop all of them at the same time.
|
||||||
|
|
||||||
|
Note that if you are going to start the nodes on different machines you need to specify the
|
||||||
|
ip-addresses or host names of the machines in `application.conf` instead of `127.0.0.1`
|
||||||
|
|
||||||
|
### Joining automatically to seed nodes with Cluster Bootstrap
|
||||||
|
|
||||||
|
Automatic discovery of nodes for the joining process is available
|
||||||
|
using the open source Akka Management project's module,
|
||||||
|
@ref:[Cluster Bootstrap](../additional/operations.md#cluster-bootstrap).
|
||||||
|
Please refer to its documentation for more details.
|
||||||
|
|
||||||
|
### Joining programmatically to seed nodes
|
||||||
|
|
||||||
|
@@include[cluster.md](../includes/cluster.md) { #join-seeds-programmatic }
|
||||||
|
|
||||||
|
### Tuning joins
|
||||||
|
|
||||||
|
Unsuccessful attempts to contact seed nodes are automatically retried after the time period defined in
|
||||||
|
configuration property `seed-node-timeout`. Unsuccessful attempt to join a specific seed node is
|
||||||
|
automatically retried after the configured `retry-unsuccessful-join-after`. Retrying means that it
|
||||||
|
tries to contact all seed nodes and then joins the node that answers first. The first node in the list
|
||||||
|
of seed nodes will join itself if it cannot contact any of the other seed nodes within the
|
||||||
|
configured `seed-node-timeout`.
|
||||||
|
|
||||||
|
The joining of given seed nodes will by default be retried indefinitely until
|
||||||
|
a successful join. That process can be aborted if unsuccessful by configuring a
|
||||||
|
timeout. When aborted it will run @ref:[Coordinated Shutdown](../actors.md#coordinated-shutdown),
|
||||||
|
which by default will terminate the ActorSystem. CoordinatedShutdown can also be configured to exit
|
||||||
|
the JVM. It is useful to define this timeout if the `seed-nodes` are assembled
|
||||||
|
dynamically and a restart with new seed-nodes should be tried after unsuccessful
|
||||||
|
attempts.
|
||||||
|
|
||||||
|
```
|
||||||
|
akka.cluster.shutdown-after-unsuccessful-join-seed-nodes = 20s
|
||||||
|
akka.coordinated-shutdown.terminate-actor-system = on
|
||||||
|
```
|
||||||
|
|
||||||
|
If you don't configure seed nodes or use one of the join seed node functions you need to join the cluster manually,
|
||||||
|
which can be performed by using [JMX](#jmx) or [HTTP](#http).
|
||||||
|
|
||||||
|
You can join to any node in the cluster. It does not have to be configured as a seed node.
|
||||||
|
Note that you can only join to an existing cluster member, which means that for bootstrapping some
|
||||||
|
node must join itself,and then the following nodes could join them to make up a cluster.
|
||||||
|
|
||||||
|
An actor system can only join a cluster once. Additional attempts will be ignored.
|
||||||
|
When it has successfully joined it must be restarted to be able to join another
|
||||||
|
cluster or to join the same cluster again. It can use the same host name and port
|
||||||
|
after the restart, when it come up as new incarnation of existing member in the cluster,
|
||||||
|
trying to join in, then the existing one will be removed from the cluster and then it will
|
||||||
|
be allowed to join.
|
||||||
|
|
||||||
|
<a id="automatic-vs-manual-downing"></a>
|
||||||
|
### Downing
|
||||||
|
|
||||||
|
When a member is considered by the failure detector to be `unreachable` the
|
||||||
|
leader is not allowed to perform its duties, such as changing status of
|
||||||
|
new joining members to 'Up'. The node must first become `reachable` again, or the
|
||||||
|
status of the unreachable member must be changed to 'Down'. Changing status to 'Down'
|
||||||
|
can be performed automatically or manually. By default it must be done manually, using
|
||||||
|
@ref:[JMX](../additional/operations.md#jmx) or @ref:[HTTP](../additional/operations.md#http).
|
||||||
|
|
||||||
|
It can also be performed programmatically with @scala[`Cluster(system).down(address)`]@java[`Cluster.get(system).down(address)`].
|
||||||
|
|
||||||
|
If a node is still running and sees its self as Down it will shutdown. @ref:[Coordinated Shutdown](../actors.md#coordinated-shutdown) will automatically
|
||||||
|
run if `run-coordinated-shutdown-when-down` is set to `on` (the default) however the node will not try
|
||||||
|
and leave the cluster gracefully so sharding and singleton migration will not occur.
|
||||||
|
|
||||||
|
A production solution for the downing problem is provided by
|
||||||
|
[Split Brain Resolver](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html),
|
||||||
|
which is part of the [Lightbend Reactive Platform](http://www.lightbend.com/platform).
|
||||||
|
If you don’t use RP, you should anyway carefully read the [documentation](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html)
|
||||||
|
of the Split Brain Resolver and make sure that the solution you are using handles the concerns
|
||||||
|
described there.
|
||||||
|
|
||||||
|
### Auto-downing (DO NOT USE)
|
||||||
|
|
||||||
|
There is an automatic downing feature that you should not use in production. For testing you can enable it with configuration:
|
||||||
|
|
||||||
|
```
|
||||||
|
akka.cluster.auto-down-unreachable-after = 120s
|
||||||
|
```
|
||||||
|
|
||||||
|
This means that the cluster leader member will change the `unreachable` node
|
||||||
|
status to `down` automatically after the configured time of unreachability.
|
||||||
|
|
||||||
|
This is a naïve approach to remove unreachable nodes from the cluster membership.
|
||||||
|
It can be useful during development but in a production environment it will eventually breakdown the cluster.
|
||||||
|
When a network partition occurs, both sides of the partition will see the other side as unreachable and remove it from the cluster.
|
||||||
|
This results in the formation of two separate, disconnected, clusters (known as *Split Brain*).
|
||||||
|
|
||||||
|
This behaviour is not limited to network partitions. It can also occur if a node
|
||||||
|
in the cluster is overloaded, or experiences a long GC pause.
|
||||||
|
|
||||||
|
@@@ warning
|
||||||
|
|
||||||
|
We recommend against using the auto-down feature of Akka Cluster in production. It
|
||||||
|
has multiple undesirable consequences for production systems.
|
||||||
|
|
||||||
|
If you are using @ref:[Cluster Singleton](cluster-singleton.md) or @ref:[Cluster Sharding](cluster-sharding.md) it can break the contract provided by
|
||||||
|
those features. Both provide a guarantee that an actor will be unique in a cluster.
|
||||||
|
With the auto-down feature enabled, it is possible for multiple independent clusters
|
||||||
|
to form (*Split Brain*). When this happens the guaranteed uniqueness will no
|
||||||
|
longer be true resulting in undesirable behaviour in the system.
|
||||||
|
|
||||||
|
This is even more severe when @ref:[Akka Persistence](persistence.md) is used in
|
||||||
|
conjunction with Cluster Sharding. In this case, the lack of unique actors can
|
||||||
|
cause multiple actors to write to the same journal. Akka Persistence operates on a
|
||||||
|
single writer principle. Having multiple writers will corrupt the journal
|
||||||
|
and make it unusable.
|
||||||
|
|
||||||
|
Finally, even if you don't use features such as Persistence, Sharding, or Singletons,
|
||||||
|
auto-downing can lead the system to form multiple small clusters. These small
|
||||||
|
clusters will be independent from each other. They will be unable to communicate
|
||||||
|
and as a result you may experience performance degradation. Once this condition
|
||||||
|
occurs, it will require manual intervention in order to reform the cluster.
|
||||||
|
|
||||||
|
Because of these issues, auto-downing should **never** be used in a production environment.
|
||||||
|
|
||||||
|
@@@
|
||||||
|
|
||||||
|
### Leaving
|
||||||
|
|
||||||
|
There are two ways to remove a member from the cluster.
|
||||||
|
|
||||||
|
1. The recommended way to leave a cluster is a graceful exit, informing the cluster that a node shall leave.
|
||||||
|
This can be performed using @ref:[JMX](../additional/operations.md#jmx) or @ref:[HTTP](../additional/operations.md#http).
|
||||||
|
This method will offer faster hand off to peer nodes during node shutdown.
|
||||||
|
1. When a graceful exit is not possible, you can stop the actor system (or the JVM process, for example a SIGTERM sent from the environment). It will be detected
|
||||||
|
as unreachable and removed after the automatic or manual downing.
|
||||||
|
|
||||||
|
The @ref:[Coordinated Shutdown](../actors.md#coordinated-shutdown) will automatically run when the cluster node sees itself as
|
||||||
|
`Exiting`, i.e. leaving from another node will trigger the shutdown process on the leaving node.
|
||||||
|
Tasks for graceful leaving of cluster including graceful shutdown of Cluster Singletons and
|
||||||
|
Cluster Sharding are added automatically when Akka Cluster is used, i.e. running the shutdown
|
||||||
|
process will also trigger the graceful leaving if it's not already in progress.
|
||||||
|
|
||||||
|
Normally this is handled automatically, but in case of network failures during this process it might still
|
||||||
|
be necessary to set the node’s status to `Down` in order to complete the removal. For handling network failures
|
||||||
|
see [Split Brain Resolver](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html),
|
||||||
|
part of the [Lightbend Reactive Platform](http://www.lightbend.com/platform).
|
||||||
|
|
||||||
|
## Serialization
|
||||||
|
|
||||||
|
Enable @ref:[serialization](../serialization.md) to send events between ActorSystems.
|
||||||
|
@ref:[Serialization with Jackson](../serialization-jackson.md) is a good choice in many cases, and our
|
||||||
|
recommendation if you don't have other preferences or constraints.
|
||||||
|
|
||||||
|
Actor references are typically included in the messages, since there is no `sender`.
|
||||||
|
To serialize actor references to/from string representation you would use the `ActorRefResolver`.
|
||||||
|
For example here's how a serializer could look for `Ping` and `Pong` messages:
|
||||||
|
|
||||||
Scala
|
Scala
|
||||||
: @@snip [PingSerializer.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/PingSerializer.scala) { #serializer }
|
: @@snip [PingSerializer.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/PingSerializer.scala) { #serializer }
|
||||||
|
|
||||||
Java
|
Java
|
||||||
: @@snip [PingSerializerExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/PingSerializerExampleTest.java) { #serializer }
|
: @@snip [PingSerializerExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/PingSerializerExampleTest.java) { #serializer }
|
||||||
|
|
||||||
|
You can look at the
|
||||||
|
@java[@extref[Cluster example project](samples:akka-samples-cluster-java)]
|
||||||
|
@scala[@extref[Cluster example project](samples:akka-samples-cluster-scala)]
|
||||||
|
to see what this looks like in practice.
|
||||||
|
|
||||||
|
## Node Roles
|
||||||
|
|
||||||
|
Not all nodes of a cluster need to perform the same function: there might be one sub-set which runs the web front-end,
|
||||||
|
one which runs the data access layer and one for the number-crunching. Deployment of actors, for example by cluster-aware
|
||||||
|
routers, can take node roles into account to achieve this distribution of responsibilities.
|
||||||
|
|
||||||
|
The node roles are defined in the configuration property named `akka.cluster.roles`
|
||||||
|
and typically defined in the start script as a system property or environment variable.
|
||||||
|
|
||||||
|
The roles are part of the membership information in `MemberEvent` that you can subscribe to.
|
||||||
|
|
||||||
|
## Failure Detector
|
||||||
|
|
||||||
|
The nodes in the cluster monitor each other by sending heartbeats to detect if a node is
|
||||||
|
unreachable from the rest of the cluster. Please see:
|
||||||
|
|
||||||
|
* @ref:[Failure Detector specification](cluster-specification.md#failure-detector)
|
||||||
|
* @ref:[Phi Accrual Failure Detector](failure-detector.md) implementation
|
||||||
|
* [Using the Failure Detector](#using-the-failure-detector)
|
||||||
|
|
||||||
|
### Using the Failure Detector
|
||||||
|
|
||||||
|
Cluster uses the `akka.remote.PhiAccrualFailureDetector` failure detector by default, or you can provide your by
|
||||||
|
implementing the `akka.remote.FailureDetector` and configuring it:
|
||||||
|
|
||||||
|
```
|
||||||
|
akka.cluster.implementation-class = "com.example.CustomFailureDetector"
|
||||||
|
```
|
||||||
|
|
||||||
|
In the [Cluster Configuration](#configuration) you may want to adjust these
|
||||||
|
depending on you environment:
|
||||||
|
|
||||||
|
* When a *phi* value is considered to be a failure `akka.cluster.failure-detector.threshold`
|
||||||
|
* Margin of error for sudden abnormalities `akka.cluster.failure-detector.acceptable-heartbeat-pause`
|
||||||
|
|
||||||
|
## How to test
|
||||||
|
|
||||||
|
Akka comes with and uses several types of testing strategies:
|
||||||
|
|
||||||
|
* @ref:[Testing](testing.md)
|
||||||
|
* @ref:[Multi Node Testing](../multi-node-testing.md)
|
||||||
|
* @ref:[Multi JVM Testing](../multi-jvm-testing.md)
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
There are several configuration properties for the cluster. Refer to the
|
||||||
|
@ref:[reference configuration](../general/configuration.md#config-akka-cluster) for full
|
||||||
|
configuration descriptions, default values and options.
|
||||||
|
|
||||||
|
### Cluster Info Logging
|
||||||
|
|
||||||
|
You can silence the logging of cluster events at info level with configuration property:
|
||||||
|
|
||||||
|
```
|
||||||
|
akka.cluster.log-info = off
|
||||||
|
```
|
||||||
|
|
||||||
|
You can enable verbose logging of cluster events at info level, e.g. for temporary troubleshooting, with configuration property:
|
||||||
|
|
||||||
|
```
|
||||||
|
akka.cluster.log-info-verbose = on
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cluster Dispatcher
|
||||||
|
|
||||||
|
The cluster extension is implemented with actors. To protect them against
|
||||||
|
disturbance from user actors they are by default run on the internal dispatcher configured
|
||||||
|
under `akka.actor.internal-dispatcher`. The cluster actors can potentially be isolated even
|
||||||
|
further, onto their own dispatcher using the setting `akka.cluster.use-dispatcher`
|
||||||
|
or made run on the same dispatcher to keep the number of threads down.
|
||||||
|
|
||||||
|
### Configuration Compatibility Check
|
||||||
|
|
||||||
|
Creating a cluster is about deploying two or more nodes and make then behave as if they were one single application. Therefore it's extremely important that all nodes in a cluster are configured with compatible settings.
|
||||||
|
|
||||||
|
The Configuration Compatibility Check feature ensures that all nodes in a cluster have a compatible configuration. Whenever a new node is joining an existing cluster, a subset of its configuration settings (only those that are required to be checked) is sent to the nodes in the cluster for verification. Once the configuration is checked on the cluster side, the cluster sends back its own set of required configuration settings. The joining node will then verify if it's compliant with the cluster configuration. The joining node will only proceed if all checks pass, on both sides.
|
||||||
|
|
||||||
|
New custom checkers can be added by extending `akka.cluster.JoinConfigCompatChecker` and including them in the configuration. Each checker must be associated with a unique key:
|
||||||
|
|
||||||
|
```
|
||||||
|
akka.cluster.configuration-compatibility-check.checkers {
|
||||||
|
my-custom-config = "com.company.MyCustomJoinConfigCompatChecker"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
@@@ note
|
||||||
|
|
||||||
|
Configuration Compatibility Check is enabled by default, but can be disabled by setting `akka.cluster.configuration-compatibility-check.enforce-on-join = off`. This is specially useful when performing rolling updates. Obviously this should only be done if a complete cluster shutdown isn't an option. A cluster with nodes with different configuration settings may lead to data loss or data corruption.
|
||||||
|
|
||||||
|
This setting should only be disabled on the joining nodes. The checks are always performed on both sides, and warnings are logged. In case of incompatibilities, it is the responsibility of the joining node to decide if the process should be interrupted or not.
|
||||||
|
|
||||||
|
If you are performing a rolling update on cluster using Akka 2.5.9 or prior (thus, not supporting this feature), the checks will not be performed because the running cluster has no means to verify the configuration sent by the joining node, nor to send back its own configuration.
|
||||||
|
|
||||||
|
@@@
|
||||||
|
|
||||||
|
## Higher level Cluster tools
|
||||||
|
|
||||||
|
@@include[cluster.md](../includes/cluster.md) { #cluster-singleton }
|
||||||
|
See @ref:[Cluster Singleton](cluster-singleton.md).
|
||||||
|
|
||||||
|
@@include[cluster.md](../includes/cluster.md) { #cluster-sharding }
|
||||||
|
See @ref:[Cluster Sharding](cluster-sharding.md).
|
||||||
|
|
||||||
|
@@include[cluster.md](../includes/cluster.md) { #cluster-ddata }
|
||||||
|
See @ref:[Distributed Data](distributed-data.md).
|
||||||
|
|
||||||
|
@@include[cluster.md](../includes/cluster.md) { #cluster-pubsub }
|
||||||
|
Classic Pub Sub can be used by leveraging the `.toClassic` adapters.
|
||||||
|
See @ref:[Distributed Publish Subscribe in Cluster](../distributed-pub-sub.md). The API is @github[#26338](#26338).
|
||||||
|
|
||||||
|
@@include[cluster.md](../includes/cluster.md) { #cluster-multidc }
|
||||||
|
See @ref:[Cluster Multi-DC](../cluster-dc.md). The API for multi-dc sharding is @github[#27705](#27705).
|
||||||
|
|
|
||||||
72
akka-docs/src/main/paradox/typed/failure-detector.md
Normal file
72
akka-docs/src/main/paradox/typed/failure-detector.md
Normal file
|
|
@ -0,0 +1,72 @@
|
||||||
|
# Phi Accrual Failure Detector
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
Remote DeathWatch uses heartbeat messages and the failure detector to
|
||||||
|
|
||||||
|
* detect network failures and JVM crashes
|
||||||
|
* generate the `Terminated` message to the watching actor on failure
|
||||||
|
* gracefully terminate watched actors
|
||||||
|
|
||||||
|
The heartbeat arrival times are interpreted by an implementation of
|
||||||
|
[The Phi Accrual Failure Detector](https://pdfs.semanticscholar.org/11ae/4c0c0d0c36dc177c1fff5eb84fa49aa3e1a8.pdf) by Hayashibara et al.
|
||||||
|
|
||||||
|
## Failure Detector Heartbeats
|
||||||
|
|
||||||
|
Heartbeats are sent every second by default, which is configurable. They are performed in a request/reply handshake, and the replies are input to the failure detector.
|
||||||
|
|
||||||
|
The suspicion level of failure is represented by a value called *phi*.
|
||||||
|
The basic idea of the phi failure detector is to express the value of *phi* on a scale that
|
||||||
|
is dynamically adjusted to reflect current network conditions.
|
||||||
|
|
||||||
|
The value of *phi* is calculated as:
|
||||||
|
|
||||||
|
```
|
||||||
|
phi = -log10(1 - F(timeSinceLastHeartbeat))
|
||||||
|
```
|
||||||
|
|
||||||
|
where F is the cumulative distribution function of a normal distribution with mean
|
||||||
|
and standard deviation estimated from historical heartbeat inter-arrival times.
|
||||||
|
|
||||||
|
An accrual failure detector decouples monitoring and interpretation. That makes
|
||||||
|
them applicable to a wider area of scenarios and more adequate to build generic
|
||||||
|
failure detection services. The idea is that it is keeping a history of failure
|
||||||
|
statistics, calculated from heartbeats received from other nodes, and is
|
||||||
|
trying to do educated guesses by taking multiple factors, and how they
|
||||||
|
accumulate over time, into account in order to come up with a better guess if a
|
||||||
|
specific node is up or down. Rather than only answering "yes" or "no" to the
|
||||||
|
question "is the node down?" it returns a `phi` value representing the
|
||||||
|
likelihood that the node is down.
|
||||||
|
|
||||||
|
The following chart illustrates how *phi* increase with increasing time since the
|
||||||
|
previous heartbeat.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Phi is calculated from the mean and standard deviation of historical
|
||||||
|
inter arrival times. The previous chart is an example for standard deviation
|
||||||
|
of 200 ms. If the heartbeats arrive with less deviation the curve becomes steeper,
|
||||||
|
i.e. it is possible to determine failure more quickly. The curve looks like this for
|
||||||
|
a standard deviation of 100 ms.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
To be able to survive sudden abnormalities, such as garbage collection pauses and
|
||||||
|
transient network failures the failure detector is configured with a margin, which
|
||||||
|
you may want to adjust depending on you environment.
|
||||||
|
This is how the curve looks like for `failure-detector.acceptable-heartbeat-pause` configured to
|
||||||
|
3 seconds.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## Failure Detector Threshold
|
||||||
|
|
||||||
|
The `threshold` that is the basis for the calculation is configurable by the
|
||||||
|
user.
|
||||||
|
|
||||||
|
* A low `threshold` is prone to generate many false positives but ensures
|
||||||
|
a quick detection in the event of a real crash.
|
||||||
|
* Conversely, a high `threshold` generates fewer mistakes but needs more time to detect actual crashes.
|
||||||
|
* The default `threshold` is 8 and is appropriate for most situations. However in
|
||||||
|
cloud environments, such as Amazon EC2, the value could be increased to 12 in
|
||||||
|
order to account for network issues that sometimes occur on such platforms.
|
||||||
|
|
@ -4,8 +4,9 @@
|
||||||
|
|
||||||
@@@ index
|
@@@ index
|
||||||
|
|
||||||
* [cluster-specification](../common/cluster.md)
|
|
||||||
* [cluster](cluster.md)
|
* [cluster](cluster.md)
|
||||||
|
* [cluster-specification](cluster-specification.md)
|
||||||
|
* [cluster-membership](cluster-membership.md)
|
||||||
* [distributed-data](distributed-data.md)
|
* [distributed-data](distributed-data.md)
|
||||||
* [cluster-singleton](cluster-singleton.md)
|
* [cluster-singleton](cluster-singleton.md)
|
||||||
* [cluster-sharding](cluster-sharding.md)
|
* [cluster-sharding](cluster-sharding.md)
|
||||||
|
|
@ -17,5 +18,6 @@
|
||||||
* [remoting-artery](../remoting-artery.md)
|
* [remoting-artery](../remoting-artery.md)
|
||||||
* [remoting](../remoting.md)
|
* [remoting](../remoting.md)
|
||||||
* [coordination](../coordination.md)
|
* [coordination](../coordination.md)
|
||||||
|
* [choosing-cluster](choosing-cluster.md)
|
||||||
|
|
||||||
@@@
|
@@@
|
||||||
|
|
|
||||||
|
|
@ -47,6 +47,8 @@ object Paradox {
|
||||||
"index.html",
|
"index.html",
|
||||||
// Page that recommends Alpakka:
|
// Page that recommends Alpakka:
|
||||||
"camel.html",
|
"camel.html",
|
||||||
|
// Page linked to from many others, but not in a TOC
|
||||||
|
"typed/failure-detector.html",
|
||||||
// TODO page not linked to
|
// TODO page not linked to
|
||||||
"fault-tolerance-sample.html"))
|
"fault-tolerance-sample.html"))
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue