doc: small improvements of core cluster pages (#27939)

* proofreading of core cluster pages * some more info in failure detector
2019-10-09 11:17:22 +02:00 · 2019-10-09 11:17:22 +02:00 · 40ce73ad4e
commit 40ce73ad4e
parent 86965d0a05
10 changed files with 110 additions and 41 deletions
--- a/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java
+++ b/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java
@ -7,6 +7,7 @@ package jdocs.akka.cluster.typed;
 // #join-seed-nodes
 import akka.actor.Address;
 import akka.actor.AddressFromURIString;
+import akka.cluster.Member;
 import akka.cluster.typed.JoinSeedNodes;

 // #join-seed-nodes
@ -120,4 +121,29 @@ public class BasicClusterExampleTest { // extends JUnitSuite {
    Cluster.get(system).manager().tell(new JoinSeedNodes(seedNodes));
    // #join-seed-nodes
  }
+
+  static class Backend {
+    static Behavior<Void> create() {
+      return Behaviors.empty();
+    }
+  }
+
+  static class Frontend {
+    static Behavior<Void> create() {
+      return Behaviors.empty();
+    }
+  }
+
+  void illustrateRoles() {
+    ActorContext<Void> context = null;
+
+    // #hasRole
+    Member selfMember = Cluster.get(context.getSystem()).selfMember();
+    if (selfMember.hasRole("backend")) {
+      context.spawn(Backend.create(), "back");
+    } else if (selfMember.hasRole("front")) {
+      context.spawn(Frontend.create(), "front");
+    }
+    // #hasRole
+  }
 }
--- a/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala
+++ b/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala
@ -62,6 +62,27 @@ akka {
    Cluster(system).manager ! JoinSeedNodes(seedNodes)
    //#join-seed-nodes
  }
+
+  object Backend {
+    def apply(): Behavior[_] = Behaviors.empty
+  }
+
+  object Frontend {
+    def apply(): Behavior[_] = Behaviors.empty
+  }
+
+  def illustrateRoles(): Unit = {
+    val context: ActorContext[_] = ???
+
+    //#hasRole
+    val selfMember = Cluster(context.system).selfMember
+    if (selfMember.hasRole("backend")) {
+      context.spawn(Backend(), "back")
+    } else if (selfMember.hasRole("frontend")) {
+      context.spawn(Frontend(), "front")
+    }
+    //#hasRole
+  }
 }

 class BasicClusterConfigSpec extends WordSpec with ScalaFutures with Eventually with Matchers with LogCapturing {
--- a/akka-docs/src/main/paradox/includes/cluster.md
+++ b/akka-docs/src/main/paradox/includes/cluster.md
@ -20,7 +20,7 @@ their physical location in the cluster.
 <!--- #cluster-ddata --->
 ### Distributed Data

-*Akka Distributed Data* is useful when you need to share data between nodes in an
+Distributed Data is useful when you need to share data between nodes in an
 Akka Cluster. The data is accessed with an actor providing a key-value store like API.

 <!--- #cluster-ddata --->
--- a/akka-docs/src/main/paradox/typed/cluster-concepts.md
+++ b/akka-docs/src/main/paradox/typed/cluster-concepts.md
@ -113,11 +113,6 @@ The role of the `leader` is to shift members in and out of the cluster, changing
 state. Currently `leader` actions are only triggered by receiving a new cluster
 state with gossip convergence.

-The `leader` also has the power, if configured so, to "auto-down" a node that
-according to the @ref:[Failure Detector](#failure-detector) is considered `unreachable`. This means setting
-the `unreachable` node status to `down` automatically after a configured time
-of unreachability.
-
 #### Seed Nodes

 The seed nodes are contact points for new nodes joining the cluster.
--- a/akka-docs/src/main/paradox/typed/cluster-membership.md
+++ b/akka-docs/src/main/paradox/typed/cluster-membership.md
@ -76,20 +76,9 @@ any `leader` actions are also not possible (for instance, allowing a node to
 become a part of the cluster). To be able to move forward the state of the
 `unreachable` nodes must be changed. It must become `reachable` again or marked
 as `down`. If the node is to join the cluster again the actor system must be
-restarted and go through the joining process again. The cluster can, through the
-leader, also *auto-down* a node after a configured time of unreachability. If new
-incarnation of unreachable node tries to rejoin the cluster old incarnation will be 
-marked as `down` and new incarnation can rejoin the cluster without manual intervention. 
-
-@@@ note
-
-If you have *auto-down* enabled and the failure detector triggers, you
-can over time end up with a lot of single node clusters if you don't put
-measures in place to shut down nodes that have become `unreachable`. This
-follows from the fact that the `unreachable` node will likely see the rest of
-the cluster as `unreachable`, become its own leader and form its own cluster.
-
-@@@
+restarted and go through the joining process again. If new incarnation of the unreachable
+node tries to rejoin the cluster old incarnation will be marked as `down` and new
+incarnation can rejoin the cluster without manual intervention. 

 <a id="weakly-up"></a>
 ## WeaklyUp Members
@ -158,4 +147,3 @@ The `leader` has the following duties:
 causing the monitored node to be marked as unreachable
   
 * **unreachable*** - unreachable is not a real member states but more of a flag in addition to the state signaling that the cluster is unable to talk to this node, after being unreachable the failure detector may detect it as reachable again and thereby remove the flag
-   
--- a/akka-docs/src/main/paradox/typed/cluster-sharding.md
+++ b/akka-docs/src/main/paradox/typed/cluster-sharding.md
@ -377,8 +377,8 @@ using Cluster Sharding. Stop all Cluster nodes before using this program.

 It can be needed to remove the data if the Cluster Sharding coordinator
 cannot startup because of corrupt data, which may happen if accidentally
-two clusters were running at the same time, e.g. caused by using auto-down
-and there was a network partition.
+two clusters were running at the same time, e.g. caused by an invalid downing
+provider when there was a network partition.

 Use this program as a standalone Java main program:

--- a/akka-docs/src/main/paradox/typed/cluster.md
+++ b/akka-docs/src/main/paradox/typed/cluster.md
@ -6,9 +6,9 @@ project.description: Build distributed applications that scale across the networ
 This document describes how to use Akka Cluster and the Cluster APIs. 
 For specific documentation topics see: 

+* @ref:[When and where to use Akka Cluster](choosing-cluster.md)
 * @ref:[Cluster Specification](cluster-concepts.md)
 * @ref:[Cluster Membership Service](cluster-membership.md)
-* @ref:[When and where to use Akka Cluster](choosing-cluster.md)
 * @ref:[Higher level Cluster tools](#higher-level-cluster-tools)
 * @ref:[Rolling Updates](../additional/rolling-updates.md)
 * @ref:[Operating, Managing, Observability](../additional/operations.md)
@ -17,7 +17,7 @@ For specific documentation topics see:
 For the Akka Classic documentation of this feature see @ref:[Classic Cluster](../cluster-usage.md).
@@@

-You have to enable @ref:[serialization](../serialization.md)  to send messages between ActorSystems in the Cluster.
+You have to enable @ref:[serialization](../serialization.md)  to send messages between ActorSystems (nodes) in the Cluster.
@ref:[Serialization with Jackson](../serialization-jackson.md) is a good choice in many cases, and our
 recommendation if you don't have other preferences or constraints.

@ -52,11 +52,11 @@ The Cluster extension gives you access to management tasks such as @ref:[Joining
 and subscription of cluster membership events such as @ref:[MemberUp, MemberRemoved and UnreachableMember](cluster-membership.md#membership-lifecycle),
 which are exposed as event APIs.  

-It does this through these references are on the `Cluster` extension:
+It does this through these references on the `Cluster` extension:

-* manager: An @scala[`ActorRef[akka.cluster.typed.ClusterCommand]`]@java[`ActorRef<akka.cluster.typed.ClusterCommand>`] where a `ClusterCommand` is a command such as: `Join`, `Leave` and `Down`
-* subscriptions: An @scala[`ActorRef[akka.cluster.typed.ClusterStateSubscription]`]@java[`ActorRef<akka.cluster.typed.ClusterStateSubscription>`] where a `ClusterStateSubscription` is one of `GetCurrentState` or `Subscribe` and `Unsubscribe` to cluster events like `MemberRemoved`
-* state: The current `CurrentClusterState`
+* `manager`: An @scala[`ActorRef[akka.cluster.typed.ClusterCommand]`]@java[`ActorRef<akka.cluster.typed.ClusterCommand>`] where a `ClusterCommand` is a command such as: `Join`, `Leave` and `Down`
+* `subscriptions`: An @scala[`ActorRef[akka.cluster.typed.ClusterStateSubscription]`]@java[`ActorRef<akka.cluster.typed.ClusterStateSubscription>`] where a `ClusterStateSubscription` is one of `GetCurrentState` or `Subscribe` and `Unsubscribe` to cluster events like `MemberRemoved`
+* `state`: The current `CurrentClusterState`

 All of the examples below assume the following imports:

@ -324,13 +324,21 @@ and leave the cluster gracefully.
 ## Node Roles

 Not all nodes of a cluster need to perform the same function: there might be one sub-set which runs the web front-end,
-one which runs the data access layer and one for the number-crunching. Deployment of actors, for example by cluster-aware
-routers, can take node roles into account to achieve this distribution of responsibilities.
+one which runs the data access layer and one for the number-crunching. Choosing which actors to start on each node,
+for example cluster-aware routers, can take node roles into account to achieve this distribution of responsibilities.

 The node roles are defined in the configuration property named `akka.cluster.roles`
 and typically defined in the start script as a system property or environment variable.

-The roles are part of the membership information in `MemberEvent` that you can subscribe to.
+The roles are part of the membership information in `MemberEvent` that you can subscribe to. The roles
+of the own node are available from the `selfMember` and that can be used for conditionally start certain
+actors:
+
+Scala
+:  @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #hasRole }
+
+Java
+:  @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #hasRole }

 ## Failure Detector

@ -416,7 +424,7 @@ or made run on the same dispatcher to keep the number of threads down.

 ### Configuration Compatibility Check

-Creating a cluster is about deploying two or more nodes and make then behave as if they were one single application. Therefore it's extremely important that all nodes in a cluster are configured with compatible settings. 
+Creating a cluster is about deploying two or more nodes and make them behave as if they were one single application. Therefore it's extremely important that all nodes in a cluster are configured with compatible settings. 

 The Configuration Compatibility Check feature ensures that all nodes in a cluster have a compatible configuration. Whenever a new node is joining an existing cluster, a subset of its configuration settings (only those that are required to be checked) is sent to the nodes in the cluster for verification. Once the configuration is checked on the cluster side, the cluster sends back its own set of required configuration settings. The joining node will then verify if it's compliant with the cluster configuration. The joining node will only proceed if all checks pass, on both sides.   

--- a/akka-docs/src/main/paradox/typed/failure-detector.md
+++ b/akka-docs/src/main/paradox/typed/failure-detector.md
@ -2,11 +2,7 @@

 ## Introduction

-Remote DeathWatch uses heartbeat messages and the failure detector to 
-
-* detect network failures and JVM crashes
-* generate the `Terminated` message to the watching actor on failure 
-* gracefully terminate watched actors  
+Remote DeathWatch uses heartbeat messages and the failure detector to detect network failures and JVM crashes. 

 The heartbeat arrival times are interpreted by an implementation of
 [The Phi Accrual Failure Detector](https://pdfs.semanticscholar.org/11ae/4c0c0d0c36dc177c1fff5eb84fa49aa3e1a8.pdf) by Hayashibara et al.
@ -59,10 +55,46 @@ This is how the curve looks like for `failure-detector.acceptable-heartbeat-paus

 ![phi3.png](../images/phi3.png)
 
+## Logging
+
+When the Cluster failure detector observes another node as unreachable it will log:
+
+```
+Marking node(s) as UNREACHABLE
+```
+
+and if it becomes reachable again:
+```
+Marking node(s) as REACHABLE
+```
+
+There is also a warning when the heartbeat arrival interval exceeds 2/3 of the `acceptable-heartbeat-pause`
+
+```
+heartbeat interval is growing too large
+```
+
+
+If you see false positives, as indicated by frequent `UNREACHABLE` followed by `REACHABLE` logging, you can
+increase the `acceptable-heartbeat-pause` if you suspect that your environment is more unstable than what
+is tolerated by the default value. However, it can be good to investigate the reason so that it is not caused
+by long (unexpected) garbage collection pauses, overloading the system, too restrictive CPU quotas settings,
+and similar.  
+
+```
+akka.cluster.failure-detector.acceptable-heartbeat-pause = 7s
+```
+
+Another log message to watch out for that typically requires investigation of the root cause:
+
+```
+Scheduled sending of heartbeat was delayed
+```
+
 ## Failure Detector Threshold

 The `threshold` that is the basis for the calculation is configurable by the
-user. 
+user, but typically it's enough to configure the `acceptable-heartbeat-pause` as described above.

 * A low `threshold` is prone to generate many false positives but ensures
 a quick detection in the event of a real crash. 
--- a/akka-docs/src/main/paradox/typed/index-cluster.md
+++ b/akka-docs/src/main/paradox/typed/index-cluster.md
@ -10,6 +10,7 @@ project.description: Akka Cluster concepts, node membership service, CRDT Distri
 * [cluster](cluster.md)
 * [cluster-specification](cluster-concepts.md)
 * [cluster-membership](cluster-membership.md)
+* [failure-detector](failure-detector.md)
 * [distributed-data](distributed-data.md)
 * [cluster-singleton](cluster-singleton.md)
 * [cluster-sharding](cluster-sharding.md)
--- a/project/Paradox.scala
+++ b/project/Paradox.scala
@ -48,8 +48,6 @@ object Paradox {
        "index.html",
        // Page that recommends Alpakka:
        "camel.html",
-        // Page linked to from many others, but not in a TOC
-        "typed/failure-detector.html",
        // TODO page not linked to
        "fault-tolerance-sample.html"))