* moved to cluster tests, in new package akka.cluster.testkit * changed config in tests * migration guide * documentation clarificiations for Downing and Leaving * update warnings in Singleton and Sharding
This commit is contained in:
parent
064f06f5a6
commit
a217d5566e
61 changed files with 414 additions and 309 deletions
|
|
@ -227,14 +227,6 @@ graceful leaving process of a cluster member.
|
|||
|
||||
See @ref:[removal of Internal Cluster Sharding Data](typed/cluster-sharding.md#removal-of-internal-cluster-sharding-data) in the documentation of the new APIs.
|
||||
|
||||
## Configuration
|
||||
|
||||
`ClusterShardingSettings` is a parameter to the `start` method of
|
||||
the `ClusterSharding` extension, i.e. each each entity type can be configured with different settings
|
||||
if needed.
|
||||
|
||||
See @ref:[configuration](typed/cluster-sharding.md#configuration) for more information.
|
||||
|
||||
## Inspecting cluster sharding state
|
||||
|
||||
Two requests to inspect the cluster state are available:
|
||||
|
|
@ -256,20 +248,13 @@ directly sending messages to the individual entities.
|
|||
|
||||
## Lease
|
||||
|
||||
A @ref[lease](coordination.md) can be used as an additional safety measure to ensure a shard
|
||||
does not run on two nodes.
|
||||
A lease can be used as an additional safety measure to ensure a shard does not run on two nodes.
|
||||
See @ref:[Lease](typed/cluster-sharding.md#lease) in the documentation of the new APIs.
|
||||
|
||||
Reasons for how this can happen:
|
||||
## Configuration
|
||||
|
||||
* Network partitions without an appropriate downing provider
|
||||
* Mistakes in the deployment process leading to two separate Akka Clusters
|
||||
* Timing issues between removing members from the Cluster on one side of a network partition and shutting them down on the other side
|
||||
`ClusterShardingSettings` is a parameter to the `start` method of
|
||||
the `ClusterSharding` extension, i.e. each each entity type can be configured with different settings
|
||||
if needed.
|
||||
|
||||
A lease can be a final backup that means that each shard won't create child entity actors unless it has the lease.
|
||||
|
||||
To use a lease for sharding set `akka.cluster.sharding.use-lease` to the configuration location
|
||||
of the lease to use. Each shard will try and acquire a lease with with the name `<actor system name>-shard-<type name>-<shard id>` and
|
||||
the owner is set to the `Cluster(system).selfAddress.hostPort`.
|
||||
|
||||
If a shard can't acquire a lease it will remain uninitialized so messages for entities it owns will
|
||||
be buffered in the `ShardRegion`. If the lease is lost after initialization the Shard will be terminated.
|
||||
See @ref:[configuration](typed/cluster-sharding.md#configuration) for more information.
|
||||
|
|
|
|||
|
|
@ -104,6 +104,14 @@ Scala
|
|||
Java
|
||||
: @@snip [SimpleClusterListener2.java](/akka-docs/src/test/java/jdocs/cluster/SimpleClusterListener2.java) { #join }
|
||||
|
||||
## Leaving
|
||||
|
||||
See @ref:[Leaving](typed/cluster.md#leaving) in the documentation of the new APIs.
|
||||
|
||||
## Downing
|
||||
|
||||
See @ref:[Downing](typed/cluster.md#downing) in the documentation of the new APIs.
|
||||
|
||||
<a id="cluster-subscriber"></a>
|
||||
## Subscribe to Cluster Events
|
||||
|
||||
|
|
|
|||
|
|
@ -3,7 +3,7 @@
|
|||
## Commercial Support
|
||||
|
||||
Commercial support is provided by [Lightbend](http://www.lightbend.com).
|
||||
Akka is part of the [Lightbend Reactive Platform](http://www.lightbend.com/platform).
|
||||
Akka is part of the [Lightbend Platform](http://www.lightbend.com/platform).
|
||||
|
||||
## Sponsors
|
||||
|
||||
|
|
|
|||
|
|
@ -11,6 +11,40 @@ is [no longer available as a static method](https://github.com/scala/bug/issues/
|
|||
|
||||
If you are still using Scala 2.11 then you must upgrade to 2.12 or 2.13
|
||||
|
||||
## Auto-downing removed
|
||||
|
||||
Auto-downing of unreachable Cluster members have been removed after warnings and recommendations against using it
|
||||
for many years. It was by default disabled, but could be enabled with configuration
|
||||
`akka.cluster.auto-down-unreachable-after`.
|
||||
|
||||
For alternatives see the @ref:[documentation about Downing](../typed/cluster.md#downing).
|
||||
|
||||
Auto-downing was a naïve approach to remove unreachable nodes from the cluster membership.
|
||||
In a production environment it will eventually break down the cluster.
|
||||
When a network partition occurs, both sides of the partition will see the other side as unreachable
|
||||
and remove it from the cluster. This results in the formation of two separate, disconnected, clusters
|
||||
(known as *Split Brain*).
|
||||
|
||||
This behavior is not limited to network partitions. It can also occur if a node in the cluster is
|
||||
overloaded, or experiences a long GC pause.
|
||||
|
||||
When using @ref:[Cluster Singleton](../typed/cluster-singleton.md) or @ref:[Cluster Sharding](../typed/cluster-sharding.md)
|
||||
it can break the contract provided by those features. Both provide a guarantee that an actor will be unique in a cluster.
|
||||
With the auto-down feature enabled, it is possible for multiple independent clusters to form (*Split Brain*).
|
||||
When this happens the guaranteed uniqueness will no longer be true resulting in undesirable behavior in the system.
|
||||
|
||||
This is even more severe when @ref:[Akka Persistence](../typed/persistence.md) is used in conjunction with
|
||||
Cluster Sharding. In this case, the lack of unique actors can cause multiple actors to write to the same journal.
|
||||
Akka Persistence operates on a single writer principle. Having multiple writers will corrupt the journal
|
||||
and make it unusable.
|
||||
|
||||
Finally, even if you don't use features such as Persistence, Sharding, or Singletons, auto-downing can lead the
|
||||
system to form multiple small clusters. These small clusters will be independent from each other. They will be
|
||||
unable to communicate and as a result you may experience performance degradation. Once this condition occurs,
|
||||
it will require manual intervention in order to reform the cluster.
|
||||
|
||||
Because of these issues, auto-downing should **never** be used in a production environment.
|
||||
|
||||
## Removed features that were deprecated
|
||||
|
||||
After being deprecated since 2.5.0, the following have been removed in Akka 2.6.
|
||||
|
|
@ -94,13 +128,25 @@ to make remote interactions look like local method calls.
|
|||
Warnings about `TypedActor` have been [mentioned in documentation](https://doc.akka.io/docs/akka/2.5/typed-actors.html#when-to-use-typed-actors)
|
||||
for many years.
|
||||
|
||||
### akka-protobuf
|
||||
|
||||
`akka-protobuf` was never intended to be used by end users but perhaps this was not well-documented.
|
||||
Applications should use standard Protobuf dependency instead of `akka-protobuf`. The artifact is still
|
||||
published, but the transitive dependency to `akka-protobuf` has been removed.
|
||||
|
||||
Akka is now using Protobuf version 3.9.0 for serialization of messages defined by Akka.
|
||||
|
||||
### Cluster Client
|
||||
|
||||
Cluster client has been deprecated as of 2.6 in favor of [Akka gRPC](https://doc.akka.io/docs/akka-grpc/current/index.html).
|
||||
It is not advised to build new applications with Cluster client, and existing users @ref[should migrate to Akka gRPC](../cluster-client.md#migration-to-akka-grpc).
|
||||
|
||||
### akka.Main
|
||||
|
||||
`akka.Main` is deprecated in favour of starting the `ActorSystem` from a custom main class instead. `akka.Main` was not
|
||||
adding much value and typically a custom main class is needed anyway.
|
||||
|
||||
@@ Remoting
|
||||
## Remoting
|
||||
|
||||
### Default remoting is now Artery TCP
|
||||
|
||||
|
|
@ -184,20 +230,7 @@ For TCP:
|
|||
|
||||
Classic remoting is deprecated but can be used in `2.6.` Explicitly disable Artery by setting property `akka.remote.artery.enabled` to `false`. Further, any configuration under `akka.remote` that is
|
||||
specific to classic remoting needs to be moved to `akka.remote.classic`. To see which configuration options
|
||||
are specific to classic search for them in: [`akka-remote/reference.conf`](/akka-remote/src/main/resources/reference.conf)
|
||||
|
||||
### akka-protobuf
|
||||
|
||||
`akka-protobuf` was never intended to be used by end users but perhaps this was not well-documented.
|
||||
Applications should use standard Protobuf dependency instead of `akka-protobuf`. The artifact is still
|
||||
published, but the transitive dependency to `akka-protobuf` has been removed.
|
||||
|
||||
Akka is now using Protobuf version 3.9.0 for serialization of messages defined by Akka.
|
||||
|
||||
### Cluster Client
|
||||
|
||||
Cluster client has been deprecated as of 2.6 in favor of [Akka gRPC](https://doc.akka.io/docs/akka-grpc/current/index.html).
|
||||
It is not advised to build new applications with Cluster client, and existing users @ref[should migrate to Akka gRPC](../cluster-client.md#migration-to-akka-grpc).
|
||||
are specific to classic search for them in: @ref:[`akka-remote/reference.conf`](../general/configuration.md#config-akka-remote).
|
||||
|
||||
## Java Serialization
|
||||
|
||||
|
|
@ -235,14 +268,12 @@ handling that type and it was previously "accidentally" serialized with Java ser
|
|||
The following documents configuration changes and behavior changes where no action is required. In some cases the old
|
||||
behavior can be restored via configuration.
|
||||
|
||||
### Remoting
|
||||
|
||||
#### Remoting dependencies have been made optional
|
||||
### Remoting dependencies have been made optional
|
||||
|
||||
Classic remoting depends on Netty and Artery UDP depends on Aeron. These are now both optional dependencies that need
|
||||
to be explicitly added. See @ref[classic remoting](../remoting.md) or @ref[artery remoting](../remoting-artery.md) for instructions.
|
||||
|
||||
#### Remote watch and deployment have been disabled without Cluster use
|
||||
### Remote watch and deployment have been disabled without Cluster use
|
||||
|
||||
By default, these remoting features are disabled when not using Akka Cluster:
|
||||
|
||||
|
|
|
|||
|
|
@ -43,10 +43,10 @@ if that feature is enabled.
|
|||
|
||||
@@@ warning
|
||||
|
||||
**Don't use Cluster Sharding together with Automatic Downing**,
|
||||
since it allows the cluster to split up into two separate clusters, which in turn will result
|
||||
in *multiple shards and entities* being started, one in each separate cluster!
|
||||
See @ref:[Downing](cluster.md#automatic-vs-manual-downing).
|
||||
Make sure to not use a Cluster downing strategy that may split the cluster into several separate clusters in
|
||||
case of network problems or system overload (long GC pauses), since that will result in *multiple shards and entities*
|
||||
being started, one in each separate cluster!
|
||||
See @ref:[Downing](cluster.md#downing).
|
||||
|
||||
@@@
|
||||
|
||||
|
|
@ -304,6 +304,26 @@ rebalanced to other nodes.
|
|||
See @ref:[How To Startup when Cluster Size Reached](cluster.md#how-to-startup-when-a-cluster-size-is-reached)
|
||||
for more information about `min-nr-of-members`.
|
||||
|
||||
## Lease
|
||||
|
||||
A @ref[lease](../coordination.md) can be used as an additional safety measure to ensure a shard
|
||||
does not run on two nodes.
|
||||
|
||||
Reasons for how this can happen:
|
||||
|
||||
* Network partitions without an appropriate downing provider
|
||||
* Mistakes in the deployment process leading to two separate Akka Clusters
|
||||
* Timing issues between removing members from the Cluster on one side of a network partition and shutting them down on the other side
|
||||
|
||||
A lease can be a final backup that means that each shard won't create child entity actors unless it has the lease.
|
||||
|
||||
To use a lease for sharding set `akka.cluster.sharding.use-lease` to the configuration location
|
||||
of the lease to use. Each shard will try and acquire a lease with with the name `<actor system name>-shard-<type name>-<shard id>` and
|
||||
the owner is set to the `Cluster(system).selfAddress.hostPort`.
|
||||
|
||||
If a shard can't acquire a lease it will remain uninitialized so messages for entities it owns will
|
||||
be buffered in the `ShardRegion`. If the lease is lost after initialization the Shard will be terminated.
|
||||
|
||||
## Removal of internal Cluster Sharding data
|
||||
|
||||
Removal of internal Cluster Sharding data is only relevant for "Persistent Mode".
|
||||
|
|
@ -326,15 +346,6 @@ cannot startup because of corrupt data, which may happen if accidentally
|
|||
two clusters were running at the same time, e.g. caused by using auto-down
|
||||
and there was a network partition.
|
||||
|
||||
@@@ warning
|
||||
|
||||
**Don't use Cluster Sharding together with Automatic Downing**,
|
||||
since it allows the cluster to split up into two separate clusters, which in turn will result
|
||||
in *multiple shards and entities* being started, one in each separate cluster!
|
||||
See @ref:[Downing](cluster.md#automatic-vs-manual-downing).
|
||||
|
||||
@@@
|
||||
|
||||
Use this program as a standalone Java main program:
|
||||
|
||||
```
|
||||
|
|
@ -347,7 +358,7 @@ The program is included in the `akka-cluster-sharding` jar file. It
|
|||
is easiest to run it with same classpath and configuration as your ordinary
|
||||
application. It can be run from sbt or Maven in similar way.
|
||||
|
||||
Specify the entity type names (same as you use in the `start` method
|
||||
Specify the entity type names (same as you use in the `init` method
|
||||
of `ClusterSharding`) as program arguments.
|
||||
|
||||
If you specify `-2.3` as the first program argument it will also try
|
||||
|
|
|
|||
|
|
@ -32,6 +32,15 @@ such as single-point of bottleneck. Single-point of failure is also a relevant c
|
|||
but for some cases this feature takes care of that by making sure that another singleton
|
||||
instance will eventually be started.
|
||||
|
||||
@@@ warning
|
||||
|
||||
Make sure to not use a Cluster downing strategy that may split the cluster into several separate clusters in
|
||||
case of network problems or system overload (long GC pauses), since that will result in in *multiple Singletons*
|
||||
being started, one in each separate cluster!
|
||||
See @ref:[Downing](cluster.md#downing).
|
||||
|
||||
@@@
|
||||
|
||||
### Singleton manager
|
||||
|
||||
The cluster singleton pattern manages one singleton actor instance among all cluster nodes or a group of nodes tagged with
|
||||
|
|
@ -80,23 +89,20 @@ The singleton instance will not run on members with status @ref:[WeaklyUp](clust
|
|||
|
||||
This pattern may seem to be very tempting to use at first, but it has several drawbacks, some of them are listed below:
|
||||
|
||||
* the cluster singleton may quickly become a *performance bottleneck*,
|
||||
* you can not rely on the cluster singleton to be *non-stop* available — e.g. when the node on which the singleton has
|
||||
been running dies, it will take a few seconds for this to be noticed and the singleton be migrated to another node,
|
||||
* in the case of a *network partition* appearing in a Cluster that is using Automatic Downing (see docs for
|
||||
@ref:[Auto Downing](cluster.md#auto-downing-do-not-use),
|
||||
it may happen that the isolated clusters each decide to spin up their own singleton, meaning that there might be multiple
|
||||
singletons running in the system, yet the Clusters have no way of finding out about them (because of the partition).
|
||||
|
||||
Especially the last point is something you should be aware of — in general when using the Cluster Singleton pattern
|
||||
you should take care of downing nodes yourself and not rely on the timing based auto-down feature.
|
||||
* The cluster singleton may quickly become a *performance bottleneck*.
|
||||
* You can not rely on the cluster singleton to be *non-stop* available — e.g. when the node on which the singleton
|
||||
has been running dies, it will take a few seconds for this to be noticed and the singleton be migrated to another node.
|
||||
* If many singletons are used be aware of that all will run on the oldest node (or oldest with configured role).
|
||||
@ref:[Cluster Sharding](cluster-sharding.md) combined with keeping the "singleton" entities alive can be a better
|
||||
alternative.
|
||||
|
||||
@@@ warning
|
||||
|
||||
**Don't use Cluster Singleton together with Automatic Downing**,
|
||||
since it allows the cluster to split up into two separate clusters, which in turn will result
|
||||
in *multiple Singletons* being started, one in each separate cluster!
|
||||
|
||||
|
||||
Make sure to not use a Cluster downing strategy that may split the cluster into several separate clusters in
|
||||
case of network problems or system overload (long GC pauses), since that will result in in *multiple Singletons*
|
||||
being started, one in each separate cluster!
|
||||
See @ref:[Downing](cluster.md#downing).
|
||||
|
||||
@@@
|
||||
|
||||
## Example
|
||||
|
|
|
|||
|
|
@ -255,95 +255,69 @@ after the restart, when it come up as new incarnation of existing member in the
|
|||
trying to join in, then the existing one will be removed from the cluster and then it will
|
||||
be allowed to join.
|
||||
|
||||
<a id="automatic-vs-manual-downing"></a>
|
||||
### Downing
|
||||
|
||||
When a member is considered by the failure detector to be `unreachable` the
|
||||
leader is not allowed to perform its duties, such as changing status of
|
||||
new joining members to 'Up'. The node must first become `reachable` again, or the
|
||||
status of the unreachable member must be changed to 'Down'. Changing status to 'Down'
|
||||
can be performed automatically or manually. By default it must be done manually, using
|
||||
@ref:[JMX](../additional/operations.md#jmx) or @ref:[HTTP](../additional/operations.md#http).
|
||||
|
||||
It can also be performed programmatically with @scala[`Cluster(system).down(address)`]@java[`Cluster.get(system).down(address)`].
|
||||
|
||||
If a node is still running and sees its self as Down it will shutdown. @ref:[Coordinated Shutdown](../actors.md#coordinated-shutdown) will automatically
|
||||
run if `run-coordinated-shutdown-when-down` is set to `on` (the default) however the node will not try
|
||||
and leave the cluster gracefully so sharding and singleton migration will not occur.
|
||||
|
||||
A production solution for the downing problem is provided by
|
||||
[Split Brain Resolver](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html),
|
||||
which is part of the [Lightbend Reactive Platform](http://www.lightbend.com/platform).
|
||||
If you don’t use RP, you should anyway carefully read the [documentation](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html)
|
||||
of the Split Brain Resolver and make sure that the solution you are using handles the concerns
|
||||
described there.
|
||||
|
||||
### Auto-downing - DO NOT USE
|
||||
|
||||
There is an automatic downing feature that you should not use in production. For testing you can enable it with configuration:
|
||||
|
||||
```
|
||||
akka.cluster.auto-down-unreachable-after = 120s
|
||||
```
|
||||
|
||||
This means that the cluster leader member will change the `unreachable` node
|
||||
status to `down` automatically after the configured time of unreachability.
|
||||
|
||||
This is a naïve approach to remove unreachable nodes from the cluster membership.
|
||||
It can be useful during development but in a production environment it will eventually breakdown the cluster.
|
||||
When a network partition occurs, both sides of the partition will see the other side as unreachable and remove it from the cluster.
|
||||
This results in the formation of two separate, disconnected, clusters (known as *Split Brain*).
|
||||
|
||||
This behaviour is not limited to network partitions. It can also occur if a node
|
||||
in the cluster is overloaded, or experiences a long GC pause.
|
||||
|
||||
@@@ warning
|
||||
|
||||
We recommend against using the auto-down feature of Akka Cluster in production. It
|
||||
has multiple undesirable consequences for production systems.
|
||||
|
||||
If you are using @ref:[Cluster Singleton](cluster-singleton.md) or @ref:[Cluster Sharding](cluster-sharding.md) it can break the contract provided by
|
||||
those features. Both provide a guarantee that an actor will be unique in a cluster.
|
||||
With the auto-down feature enabled, it is possible for multiple independent clusters
|
||||
to form (*Split Brain*). When this happens the guaranteed uniqueness will no
|
||||
longer be true resulting in undesirable behaviour in the system.
|
||||
|
||||
This is even more severe when @ref:[Akka Persistence](persistence.md) is used in
|
||||
conjunction with Cluster Sharding. In this case, the lack of unique actors can
|
||||
cause multiple actors to write to the same journal. Akka Persistence operates on a
|
||||
single writer principle. Having multiple writers will corrupt the journal
|
||||
and make it unusable.
|
||||
|
||||
Finally, even if you don't use features such as Persistence, Sharding, or Singletons,
|
||||
auto-downing can lead the system to form multiple small clusters. These small
|
||||
clusters will be independent from each other. They will be unable to communicate
|
||||
and as a result you may experience performance degradation. Once this condition
|
||||
occurs, it will require manual intervention in order to reform the cluster.
|
||||
|
||||
Because of these issues, auto-downing should **never** be used in a production environment.
|
||||
|
||||
@@@
|
||||
|
||||
### Leaving
|
||||
|
||||
There are two ways to remove a member from the cluster.
|
||||
There are a few ways to remove a member from the cluster.
|
||||
|
||||
1. The recommended way to leave a cluster is a graceful exit, informing the cluster that a node shall leave.
|
||||
This can be performed using @ref:[JMX](../additional/operations.md#jmx) or @ref:[HTTP](../additional/operations.md#http).
|
||||
This method will offer faster hand off to peer nodes during node shutdown.
|
||||
1. When a graceful exit is not possible, you can stop the actor system (or the JVM process, for example a SIGTERM sent from the environment). It will be detected
|
||||
as unreachable and removed after the automatic or manual downing.
|
||||
1. The recommended way to leave a cluster is a graceful exit, informing the cluster that a node shall leave.
|
||||
This is performed by @ref:[Coordinated Shutdown](../actors.md#coordinated-shutdown) when the `ActorSystem`
|
||||
is terminated and also when a SIGTERM is sent from the environment to stop the JVM process.
|
||||
1. Graceful exit can also be performed using @ref:[HTTP](../additional/operations.md#http) or @ref:[JMX](../additional/operations.md#jmx).
|
||||
1. When a graceful exit is not possible, for example in case of abrupt termination of the the JVM process, the node
|
||||
will be detected as unreachable by other nodes and removed after @ref:[Downing](#downing).
|
||||
|
||||
The @ref:[Coordinated Shutdown](../actors.md#coordinated-shutdown) will automatically run when the cluster node sees itself as
|
||||
Graceful leaving will offer faster hand off to peer nodes during node shutdown than abrupt termination and downing.
|
||||
|
||||
The @ref:[Coordinated Shutdown](../actors.md#coordinated-shutdown) will also run when the cluster node sees itself as
|
||||
`Exiting`, i.e. leaving from another node will trigger the shutdown process on the leaving node.
|
||||
Tasks for graceful leaving of cluster including graceful shutdown of Cluster Singletons and
|
||||
Cluster Sharding are added automatically when Akka Cluster is used, i.e. running the shutdown
|
||||
process will also trigger the graceful leaving if it's not already in progress.
|
||||
|
||||
Normally this is handled automatically, but in case of network failures during this process it might still
|
||||
be necessary to set the node’s status to `Down` in order to complete the removal. For handling network failures
|
||||
see [Split Brain Resolver](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html),
|
||||
part of the [Lightbend Reactive Platform](http://www.lightbend.com/platform).
|
||||
be necessary to set the node’s status to `Down` in order to complete the removal, see @ref:[Downing](#downing).
|
||||
|
||||
### Downing
|
||||
|
||||
In many cases a member can gracefully exit from the cluster as described in @ref:[Leaving](#leaving), but
|
||||
there are scenarios when an explicit downing decision is needed before it can be removed. For example in case
|
||||
of abrupt termination of the the JVM process, system overload that doesn't recover, or network partitions
|
||||
that don't heal. I such cases the node(s) will be detected as unreachable by other nodes, but they must also
|
||||
be marked as `Down` before they are removed.
|
||||
|
||||
When a member is considered by the failure detector to be `unreachable` the
|
||||
leader is not allowed to perform its duties, such as changing status of
|
||||
new joining members to 'Up'. The node must first become `reachable` again, or the
|
||||
status of the unreachable member must be changed to `Down`. Changing status to `Down`
|
||||
can be performed automatically or manually.
|
||||
|
||||
By default, downing must be performed manually using @ref:[HTTP](../additional/operations.md#http) or @ref:[JMX](../additional/operations.md#jmx).
|
||||
|
||||
Note that @ref:[Cluster Singleton](cluster-singleton.md) or @ref:[Cluster Sharding entities](cluster-sharding.md) that
|
||||
are running on a crashed (unreachable) node will not be started on another node until the previous node has
|
||||
been removed from the Cluster. Removal of crashed (unreachable) nodes is performed after a downing decision.
|
||||
|
||||
A production solution for downing is provided by
|
||||
[Split Brain Resolver](https://doc.akka.io/docs/akka-enhancements/current/split-brain-resolver.html),
|
||||
which is part of the [Lightbend Platform](http://www.lightbend.com/platform).
|
||||
If you don’t have a Lightbend Platform Subscription, you should still carefully read the
|
||||
[documentation](https://doc.akka.io/docs/akka-enhancements/current/split-brain-resolver.html)
|
||||
of the Split Brain Resolver and make sure that the solution you are using handles the concerns and scenarios
|
||||
described there.
|
||||
|
||||
A custom downing strategy can be implemented with a @apidoc[akka.cluster.DowningProvider] and enabled with
|
||||
configuration `akka.cluster.downing-provider-class`.
|
||||
|
||||
Downing can also be performed programmatically with @scala[`Cluster(system).manager ! Down(address)`]@java[`Cluster.get(system).manager().tell(Down(address))`],
|
||||
but that is mostly useful from tests and when implementing a `DowningProvider`.
|
||||
|
||||
If a crashed node is restarted with the same hostname and port and joining the cluster again the previous incarnation
|
||||
of that member will be downed and removed. The new join attempt with same hostname and port is used as evidence
|
||||
that the previous is not alive any more.
|
||||
|
||||
If a node is still running and sees its self as `Down` it will shutdown. @ref:[Coordinated Shutdown](../actors.md#coordinated-shutdown) will automatically
|
||||
run if `run-coordinated-shutdown-when-down` is set to `on` (the default) however the node will not try
|
||||
and leave the cluster gracefully.
|
||||
|
||||
## Node Roles
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue