doc: small improvements of core cluster pages (#27939)

* proofreading of core cluster pages
* some more info in failure detector
This commit is contained in:
Patrik Nordwall 2019-10-09 11:17:22 +02:00 committed by Christopher Batey
parent 86965d0a05
commit 40ce73ad4e
10 changed files with 110 additions and 41 deletions

View file

@ -7,6 +7,7 @@ package jdocs.akka.cluster.typed;
// #join-seed-nodes
import akka.actor.Address;
import akka.actor.AddressFromURIString;
import akka.cluster.Member;
import akka.cluster.typed.JoinSeedNodes;
// #join-seed-nodes
@ -120,4 +121,29 @@ public class BasicClusterExampleTest { // extends JUnitSuite {
Cluster.get(system).manager().tell(new JoinSeedNodes(seedNodes));
// #join-seed-nodes
}
static class Backend {
static Behavior<Void> create() {
return Behaviors.empty();
}
}
static class Frontend {
static Behavior<Void> create() {
return Behaviors.empty();
}
}
void illustrateRoles() {
ActorContext<Void> context = null;
// #hasRole
Member selfMember = Cluster.get(context.getSystem()).selfMember();
if (selfMember.hasRole("backend")) {
context.spawn(Backend.create(), "back");
} else if (selfMember.hasRole("front")) {
context.spawn(Frontend.create(), "front");
}
// #hasRole
}
}

View file

@ -62,6 +62,27 @@ akka {
Cluster(system).manager ! JoinSeedNodes(seedNodes)
//#join-seed-nodes
}
object Backend {
def apply(): Behavior[_] = Behaviors.empty
}
object Frontend {
def apply(): Behavior[_] = Behaviors.empty
}
def illustrateRoles(): Unit = {
val context: ActorContext[_] = ???
//#hasRole
val selfMember = Cluster(context.system).selfMember
if (selfMember.hasRole("backend")) {
context.spawn(Backend(), "back")
} else if (selfMember.hasRole("frontend")) {
context.spawn(Frontend(), "front")
}
//#hasRole
}
}
class BasicClusterConfigSpec extends WordSpec with ScalaFutures with Eventually with Matchers with LogCapturing {

View file

@ -20,7 +20,7 @@ their physical location in the cluster.
<!--- #cluster-ddata --->
### Distributed Data
*Akka Distributed Data* is useful when you need to share data between nodes in an
Distributed Data is useful when you need to share data between nodes in an
Akka Cluster. The data is accessed with an actor providing a key-value store like API.
<!--- #cluster-ddata --->

View file

@ -113,11 +113,6 @@ The role of the `leader` is to shift members in and out of the cluster, changing
state. Currently `leader` actions are only triggered by receiving a new cluster
state with gossip convergence.
The `leader` also has the power, if configured so, to "auto-down" a node that
according to the @ref:[Failure Detector](#failure-detector) is considered `unreachable`. This means setting
the `unreachable` node status to `down` automatically after a configured time
of unreachability.
#### Seed Nodes
The seed nodes are contact points for new nodes joining the cluster.

View file

@ -76,20 +76,9 @@ any `leader` actions are also not possible (for instance, allowing a node to
become a part of the cluster). To be able to move forward the state of the
`unreachable` nodes must be changed. It must become `reachable` again or marked
as `down`. If the node is to join the cluster again the actor system must be
restarted and go through the joining process again. The cluster can, through the
leader, also *auto-down* a node after a configured time of unreachability. If new
incarnation of unreachable node tries to rejoin the cluster old incarnation will be
marked as `down` and new incarnation can rejoin the cluster without manual intervention.
@@@ note
If you have *auto-down* enabled and the failure detector triggers, you
can over time end up with a lot of single node clusters if you don't put
measures in place to shut down nodes that have become `unreachable`. This
follows from the fact that the `unreachable` node will likely see the rest of
the cluster as `unreachable`, become its own leader and form its own cluster.
@@@
restarted and go through the joining process again. If new incarnation of the unreachable
node tries to rejoin the cluster old incarnation will be marked as `down` and new
incarnation can rejoin the cluster without manual intervention.
<a id="weakly-up"></a>
## WeaklyUp Members
@ -158,4 +147,3 @@ The `leader` has the following duties:
causing the monitored node to be marked as unreachable
* **unreachable*** - unreachable is not a real member states but more of a flag in addition to the state signaling that the cluster is unable to talk to this node, after being unreachable the failure detector may detect it as reachable again and thereby remove the flag

View file

@ -377,8 +377,8 @@ using Cluster Sharding. Stop all Cluster nodes before using this program.
It can be needed to remove the data if the Cluster Sharding coordinator
cannot startup because of corrupt data, which may happen if accidentally
two clusters were running at the same time, e.g. caused by using auto-down
and there was a network partition.
two clusters were running at the same time, e.g. caused by an invalid downing
provider when there was a network partition.
Use this program as a standalone Java main program:

View file

@ -6,9 +6,9 @@ project.description: Build distributed applications that scale across the networ
This document describes how to use Akka Cluster and the Cluster APIs.
For specific documentation topics see:
* @ref:[When and where to use Akka Cluster](choosing-cluster.md)
* @ref:[Cluster Specification](cluster-concepts.md)
* @ref:[Cluster Membership Service](cluster-membership.md)
* @ref:[When and where to use Akka Cluster](choosing-cluster.md)
* @ref:[Higher level Cluster tools](#higher-level-cluster-tools)
* @ref:[Rolling Updates](../additional/rolling-updates.md)
* @ref:[Operating, Managing, Observability](../additional/operations.md)
@ -17,7 +17,7 @@ For specific documentation topics see:
For the Akka Classic documentation of this feature see @ref:[Classic Cluster](../cluster-usage.md).
@@@
You have to enable @ref:[serialization](../serialization.md) to send messages between ActorSystems in the Cluster.
You have to enable @ref:[serialization](../serialization.md) to send messages between ActorSystems (nodes) in the Cluster.
@ref:[Serialization with Jackson](../serialization-jackson.md) is a good choice in many cases, and our
recommendation if you don't have other preferences or constraints.
@ -52,11 +52,11 @@ The Cluster extension gives you access to management tasks such as @ref:[Joining
and subscription of cluster membership events such as @ref:[MemberUp, MemberRemoved and UnreachableMember](cluster-membership.md#membership-lifecycle),
which are exposed as event APIs.
It does this through these references are on the `Cluster` extension:
It does this through these references on the `Cluster` extension:
* manager: An @scala[`ActorRef[akka.cluster.typed.ClusterCommand]`]@java[`ActorRef<akka.cluster.typed.ClusterCommand>`] where a `ClusterCommand` is a command such as: `Join`, `Leave` and `Down`
* subscriptions: An @scala[`ActorRef[akka.cluster.typed.ClusterStateSubscription]`]@java[`ActorRef<akka.cluster.typed.ClusterStateSubscription>`] where a `ClusterStateSubscription` is one of `GetCurrentState` or `Subscribe` and `Unsubscribe` to cluster events like `MemberRemoved`
* state: The current `CurrentClusterState`
* `manager`: An @scala[`ActorRef[akka.cluster.typed.ClusterCommand]`]@java[`ActorRef<akka.cluster.typed.ClusterCommand>`] where a `ClusterCommand` is a command such as: `Join`, `Leave` and `Down`
* `subscriptions`: An @scala[`ActorRef[akka.cluster.typed.ClusterStateSubscription]`]@java[`ActorRef<akka.cluster.typed.ClusterStateSubscription>`] where a `ClusterStateSubscription` is one of `GetCurrentState` or `Subscribe` and `Unsubscribe` to cluster events like `MemberRemoved`
* `state`: The current `CurrentClusterState`
All of the examples below assume the following imports:
@ -324,13 +324,21 @@ and leave the cluster gracefully.
## Node Roles
Not all nodes of a cluster need to perform the same function: there might be one sub-set which runs the web front-end,
one which runs the data access layer and one for the number-crunching. Deployment of actors, for example by cluster-aware
routers, can take node roles into account to achieve this distribution of responsibilities.
one which runs the data access layer and one for the number-crunching. Choosing which actors to start on each node,
for example cluster-aware routers, can take node roles into account to achieve this distribution of responsibilities.
The node roles are defined in the configuration property named `akka.cluster.roles`
and typically defined in the start script as a system property or environment variable.
The roles are part of the membership information in `MemberEvent` that you can subscribe to.
The roles are part of the membership information in `MemberEvent` that you can subscribe to. The roles
of the own node are available from the `selfMember` and that can be used for conditionally start certain
actors:
Scala
: @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #hasRole }
Java
: @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #hasRole }
## Failure Detector
@ -416,7 +424,7 @@ or made run on the same dispatcher to keep the number of threads down.
### Configuration Compatibility Check
Creating a cluster is about deploying two or more nodes and make then behave as if they were one single application. Therefore it's extremely important that all nodes in a cluster are configured with compatible settings.
Creating a cluster is about deploying two or more nodes and make them behave as if they were one single application. Therefore it's extremely important that all nodes in a cluster are configured with compatible settings.
The Configuration Compatibility Check feature ensures that all nodes in a cluster have a compatible configuration. Whenever a new node is joining an existing cluster, a subset of its configuration settings (only those that are required to be checked) is sent to the nodes in the cluster for verification. Once the configuration is checked on the cluster side, the cluster sends back its own set of required configuration settings. The joining node will then verify if it's compliant with the cluster configuration. The joining node will only proceed if all checks pass, on both sides.

View file

@ -2,11 +2,7 @@
## Introduction
Remote DeathWatch uses heartbeat messages and the failure detector to
* detect network failures and JVM crashes
* generate the `Terminated` message to the watching actor on failure
* gracefully terminate watched actors
Remote DeathWatch uses heartbeat messages and the failure detector to detect network failures and JVM crashes.
The heartbeat arrival times are interpreted by an implementation of
[The Phi Accrual Failure Detector](https://pdfs.semanticscholar.org/11ae/4c0c0d0c36dc177c1fff5eb84fa49aa3e1a8.pdf) by Hayashibara et al.
@ -59,10 +55,46 @@ This is how the curve looks like for `failure-detector.acceptable-heartbeat-paus
![phi3.png](../images/phi3.png)
## Logging
When the Cluster failure detector observes another node as unreachable it will log:
```
Marking node(s) as UNREACHABLE
```
and if it becomes reachable again:
```
Marking node(s) as REACHABLE
```
There is also a warning when the heartbeat arrival interval exceeds 2/3 of the `acceptable-heartbeat-pause`
```
heartbeat interval is growing too large
```
If you see false positives, as indicated by frequent `UNREACHABLE` followed by `REACHABLE` logging, you can
increase the `acceptable-heartbeat-pause` if you suspect that your environment is more unstable than what
is tolerated by the default value. However, it can be good to investigate the reason so that it is not caused
by long (unexpected) garbage collection pauses, overloading the system, too restrictive CPU quotas settings,
and similar.
```
akka.cluster.failure-detector.acceptable-heartbeat-pause = 7s
```
Another log message to watch out for that typically requires investigation of the root cause:
```
Scheduled sending of heartbeat was delayed
```
## Failure Detector Threshold
The `threshold` that is the basis for the calculation is configurable by the
user.
user, but typically it's enough to configure the `acceptable-heartbeat-pause` as described above.
* A low `threshold` is prone to generate many false positives but ensures
a quick detection in the event of a real crash.

View file

@ -10,6 +10,7 @@ project.description: Akka Cluster concepts, node membership service, CRDT Distri
* [cluster](cluster.md)
* [cluster-specification](cluster-concepts.md)
* [cluster-membership](cluster-membership.md)
* [failure-detector](failure-detector.md)
* [distributed-data](distributed-data.md)
* [cluster-singleton](cluster-singleton.md)
* [cluster-sharding](cluster-sharding.md)

View file

@ -48,8 +48,6 @@ object Paradox {
"index.html",
// Page that recommends Alpakka:
"camel.html",
// Page linked to from many others, but not in a TOC
"typed/failure-detector.html",
// TODO page not linked to
"fault-tolerance-sample.html"))