WIP cluster usage - Swap content from Classic pages to Typed #24717 (#27708)

2019-09-18 11:01:59 -07:00 · 2019-09-18 11:01:59 -07:00 · edad69b38c
commit edad69b38c
parent b2b948ff0e
25 changed files with 1095 additions and 994 deletions
--- a/akka-docs/src/main/paradox/.htaccess
+++ b/akka-docs/src/main/paradox/.htaccess
@ -33,3 +33,6 @@ RedirectMatch 301 ^(.*)/howto\.html$ https://doc.akka.io/docs/akka/2.5/howto.htm
 RedirectMatch 301 ^(.*)/common/duration\.html$ https://doc.akka.io/docs/akka/2.5/common/duration.html
 RedirectMatch 301 ^(.*)/futures\.html$ https://doc.akka.io/docs/akka/2.5/futures.html
 RedirectMatch 301 ^(.*)/java8-compat\.html$ https://doc.akka.io/docs/akka/2.5/java8-compat.html
 RedirectMatch 301 ^(.*)/common/cluster\.html$ $1/typed/cluster-specification.html
 RedirectMatch 301 ^(.*)/additional/observability\.html$ $1/additional/operations.html
--- a/akka-docs/src/main/paradox/additional/deploy.md
+++ b/akka-docs/src/main/paradox/additional/deploy.md
@ -5,11 +5,8 @@
@@@ index
 * [Packaging](packaging.md)
 * [Operating, Managing, Observability](operations.md)
 * [Deploying](deploying.md)
 * [Rolling Updates](rolling-updates.md)
 * [Observability](observability.md)
@@@
-Information on running and managing Akka applications can be found in 
+@@@
 the [Akka Management](https://doc.akka.io/docs/akka-management/current/) project documentation.
--- a/akka-docs/src/main/paradox/additional/observability.md
+++ b/akka-docs/src/main/paradox/additional/observability.md
@ -1,3 +0,0 @@
 # Monitoring
 Aside from log monitoring and the monitoring provided by your APM or platform provider, [Lightbend Telemetry](https://developer.lightbend.com/docs/telemetry/current/instrumentations/akka/akka.html), available through a [Lightbend Platform Subscription](https://www.lightbend.com/lightbend-platform-subscription), can provide additional insights in the run-time characteristics of your application, including metrics, events, and distributed tracing for Akka Actors, Cluster, HTTP, and more.
--- a/akka-docs/src/main/paradox/additional/operations.md
+++ b/akka-docs/src/main/paradox/additional/operations.md
@ -0,0 +1,62 @@
 # Operating a Cluster
 This documentation discusses how to operate a cluster. For related, more specific guides
 see [Packaging](packaging.md), [Deploying](deploying.md) and [Rolling Updates](rolling-updates.md).
 ## Starting 
 ### Cluster Bootstrap
 When starting clusters on cloud systems such as Kubernetes, AWS, Google Cloud, Azure, Mesos or others,
 you may want to automate the discovery of nodes for the cluster joining process, using your cloud providers,
 cluster orchestrator, or other form of service discovery (such as managed DNS).
 The open source Akka Management library includes the [Cluster Bootstrap](https://doc.akka.io/docs/akka-management/current/bootstrap/index.html)
 module which handles just that. Please refer to its documentation for more details.
@@@ note
 If you are running Akka in a Docker container or the nodes for some other reason have separate internal and
 external ip addresses you must configure remoting according to @ref:[Akka behind NAT or in a Docker container](../remoting-artery.md#remote-configuration-nat-artery)
@@@
 ## Stopping 
 See @ref:[Rolling Updates, Cluster Shutdown and Coordinated Shutdown](../additional/rolling-updates.md#cluster-shutdown).
 ## Cluster Management
 There are several management tools for the cluster. 
 Complete information on running and managing Akka applications can be found in 
 the [Akka Management](https://doc.akka.io/docs/akka-management/current/) project documentation.
 <a id="cluster-http"></a>
 ### HTTP
 Information and management of the cluster is available with a HTTP API.
 See documentation of [Akka Management](http://developer.lightbend.com/docs/akka-management/current/).
 <a id="cluster-jmx"></a>
 ### JMX
 Information and management of the cluster is available as JMX MBeans with the root name `akka.Cluster`.
 The JMX information can be displayed with an ordinary JMX console such as JConsole or JVisualVM.
 From JMX you can:
 * see what members that are part of the cluster
 * see status of this node
 * see roles of each member
 * join this node to another node in cluster
 * mark any node in the cluster as down
 * tell any node in the cluster to leave
 Member nodes are identified by their address, in format *`akka://actor-system-name@hostname:port`*.
 ## Monitoring and Observability
 Aside from log monitoring and the monitoring provided by your APM or platform provider, [Lightbend Telemetry](https://developer.lightbend.com/docs/telemetry/current/instrumentations/akka/akka.html),
 available through a [Lightbend Platform Subscription](https://www.lightbend.com/lightbend-platform-subscription),
 can provide additional insights in the run-time characteristics of your application, including metrics, events,
 and distributed tracing for Akka Actors, Cluster, HTTP, and more.
--- a/akka-docs/src/main/paradox/additional/osgi.md
+++ b/akka-docs/src/main/paradox/additional/osgi.md
@ -86,7 +86,7 @@ in an application composed of multiple JARs to reside under a single package nam
 might scan all classes from `com.example.plugins` for specific service implementations with that package existing in
 several contributed JARs.
   While it is possible to support overlapping packages with complex manifest headers, it's much better to use non-overlapping
-package spaces and facilities such as @ref:[Akka Cluster](../common/cluster.md)
+package spaces and facilities such as @ref:[Akka Cluster](../typed/cluster-specification.md)
 for service discovery. Stylistically, many organizations opt to use the root package path as the name of the bundle
 distribution file.
--- a/akka-docs/src/main/paradox/additional/rolling-updates.md
+++ b/akka-docs/src/main/paradox/additional/rolling-updates.md
@ -63,7 +63,7 @@ Environments such as Kubernetes send a SIGTERM, however if the JVM is wrapped wi
 ### Ungraceful shutdown 
 In case of network failures it may still be necessary to set the node's status to Down in order to complete the removal. 
-@ref:[Cluster Downing](../cluster-usage.md#downing) details downing nodes and downing providers. 
+@ref:[Cluster Downing](../typed/cluster.md#downing) details downing nodes and downing providers. 
 [Split Brain Resolver](https://doc.akka.io/docs/akka-enhancements/current/split-brain-resolver.html) can be used to ensure 
 the cluster continues to function during network partitions and node failures. For example
 if there is an unreachability problem Split Brain Resolver would make a decision based on the configured downing strategy. 
@ -79,7 +79,7 @@ and ensure all nodes are in this state
 * Deploy again and with the new nodes set to `akka.cluster.configuration-compatibility-check.enforce-on-join = on`. 
 Full documentation about enforcing these checks on joining nodes and optionally adding custom checks can be found in  
-@ref:[Akka Cluster configuration compatibility checks](../cluster-usage.md#configuration-compatibility-check).
+@ref:[Akka Cluster configuration compatibility checks](../typed/cluster.md#configuration-compatibility-check).
 ## Rolling Updates and Migrating Akka
--- a/akka-docs/src/main/paradox/cluster-dc.md
+++ b/akka-docs/src/main/paradox/cluster-dc.md
@ -76,8 +76,8 @@ up a large cluster into smaller groups of nodes for better scalability.
 ## Membership
-Some @ref[membership transitions](common/cluster.md#membership-lifecycle) are managed by 
+Some @ref[membership transitions](typed/cluster-membership.md#membership-lifecycle) are managed by 
-one node called the @ref[leader](common/cluster.md#leader). There is one leader per data center
+one node called the @ref[leader](typed/cluster-specification.md#leader). There is one leader per data center
 and it is responsible for these transitions for the members within the same data center. Members of
 other data centers are managed independently by the leader of the respective data center. These actions
 cannot be performed while there are any unreachability observations among the nodes in the data center, 
@ -91,7 +91,7 @@ User actions like joining, leaving, and downing can be sent to any node in the c
 not only to the nodes in the data center of the node. Seed nodes are also global.
 The data center membership is implemented by adding the data center name prefixed with `"dc-"` to the 
-@ref[roles](cluster-usage.md#node-roles) of the member and thereby this information is known
+@ref[roles](typed/cluster.md#node-roles) of the member and thereby this information is known
 by all other members in the cluster. This is an implementation detail, but it can be good to know
 if you see this in log messages.
@ -105,7 +105,7 @@ Java
 ## Failure Detection
-@ref[Failure detection](cluster-usage.md#failure-detector) is performed by sending heartbeat messages
+@ref[Failure detection](typed/cluster-specification.md#failure-detector) is performed by sending heartbeat messages
 to detect if a node is unreachable. This is done more frequently and with more certainty among
 the nodes in the same data center than across data centers. The failure detection across different data centers should
 be interpreted as an indication of problem with the network link between the data centers.
@ -132,6 +132,8 @@ This influences how rolling upgrades should be performed. Don't stop all of the
 at the same time. Stop one or a few at a time so that new nodes can take over the responsibility.
 It's best to leave the oldest nodes until last.
 See the @ref:[failure detector](typed/cluster.md#failure-detector) for more details.
 ## Cluster Singleton
 The @ref[Cluster Singleton](cluster-singleton.md) is a singleton per data center. If you start the 
--- a/akka-docs/src/main/paradox/cluster-metrics.md
+++ b/akka-docs/src/main/paradox/cluster-metrics.md
@ -27,7 +27,7 @@ Cluster metrics information is primarily used for load-balancing routers,
 and can also be used to implement advanced metrics-based node life cycles,
 such as "Node Let-it-crash" when CPU steal time becomes excessive.
-Cluster members with status @ref:[WeaklyUp](cluster-usage.md#weakly-up), if that feature is enabled,
+Cluster members with status @ref:[WeaklyUp](typed/cluster-membership.md#weaklyup-members), if that feature is enabled,
 will participate in Cluster Metrics collection and dissemination.
 ## Metrics Collector
--- a/akka-docs/src/main/paradox/cluster-sharding.md
+++ b/akka-docs/src/main/paradox/cluster-sharding.md
@ -41,7 +41,7 @@ the sender to know the location of the destination actor. This is achieved by se
 the messages via a `ShardRegion` actor provided by this extension, which knows how
 to route the message with the entity id to the final destination.
-Cluster sharding will not be active on members with status @ref:[WeaklyUp](cluster-usage.md#weakly-up)
+Cluster sharding will not be active on members with status @ref:[WeaklyUp](typed/cluster-membership.md#weaklyup-members)
 if that feature is enabled.
@@@ warning
@ -49,7 +49,7 @@ if that feature is enabled.
 **Don't use Cluster Sharding together with Automatic Downing**,
 since it allows the cluster to split up into two separate clusters, which in turn will result
 in *multiple shards and entities* being started, one in each separate cluster!
-See @ref:[Downing](cluster-usage.md#automatic-vs-manual-downing).
+See @ref:[Downing](typed/cluster.md#automatic-vs-manual-downing).
@@@
@ -492,7 +492,7 @@ and there was a network partition.
 **Don't use Cluster Sharding together with Automatic Downing**,
 since it allows the cluster to split up into two separate clusters, which in turn will result
 in *multiple shards and entities* being started, one in each separate cluster!
-See @ref:[Downing](cluster-usage.md#automatic-vs-manual-downing).
+See @ref:[Downing](typed/cluster.md#automatic-vs-manual-downing).
@@@
--- a/akka-docs/src/main/paradox/cluster-singleton.md
+++ b/akka-docs/src/main/paradox/cluster-singleton.md
@ -44,7 +44,7 @@ The oldest member is determined by `akka.cluster.Member#isOlderThan`.
 This can change when removing that member from the cluster. Be aware that there is a short time
 period when there is no active singleton during the hand-over process.
-The cluster failure detector will notice when oldest node becomes unreachable due to
+The cluster @ref:[failure detector](typed/cluster.md#failure-detector) will notice when oldest node becomes unreachable due to
 things like JVM crash, hard shut down, or network failure. Then a new oldest node will
 take over and a new singleton actor is created. For these failure scenarios there will
 not be a graceful hand-over, but more than one active singletons is prevented by all
@ -65,7 +65,7 @@ It's worth noting that messages can always be lost because of the distributed na
 As always, additional logic should be implemented in the singleton (acknowledgement) and in the
 client (retry) actors to ensure at-least-once message delivery.
-The singleton instance will not run on members with status @ref:[WeaklyUp](cluster-usage.md#weakly-up).
+The singleton instance will not run on members with status @ref:[WeaklyUp](typed/cluster-membership.md#weaklyup-members).
 ## Potential problems to be aware of
@ -75,7 +75,7 @@ This pattern may seem to be very tempting to use at first, but it has several dr
 * you can not rely on the cluster singleton to be *non-stop* available — e.g. when the node on which the singleton has
 been running dies, it will take a few seconds for this to be noticed and the singleton be migrated to another node,
 * in the case of a *network partition* appearing in a Cluster that is using Automatic Downing  (see docs for
-@ref:[Auto Downing](cluster-usage.md#automatic-vs-manual-downing)),
+@ref:[Auto Downing](typed/cluster.md#automatic-vs-manual-downing)),
 it may happen that the isolated clusters each decide to spin up their own singleton, meaning that there might be multiple
 singletons running in the system, yet the Clusters have no way of finding out about them (because of the partition).
--- a/akka-docs/src/main/paradox/cluster-usage.md
+++ b/akka-docs/src/main/paradox/cluster-usage.md
@ -1,18 +1,22 @@
-# Cluster Usage
+# Classic Cluster Usage
 This document describes how to use Akka Cluster and the Cluster APIs using code samples.
 For specific documentation topics see:  
-For introduction to the Akka Cluster concepts please see @ref:[Cluster Specification](common/cluster.md).
+* @ref:[Cluster Specification](typed/cluster-specification.md)
-
+* @ref:[Cluster Membership Service](typed/cluster-membership.md)
-The core of Akka Cluster is the cluster membership, to keep track of what nodes are part of the cluster and
+* @ref:[When and where to use Akka Cluster](typed/choosing-cluster.md)
-their health. There are several @ref:[Higher level Cluster tools](cluster-usage.md#higher-level-cluster-tools) that are built
+* @ref:[Higher level Cluster tools](#higher-level-cluster-tools)
-on top of the cluster membership.
+* @ref:[Rolling Updates](additional/rolling-updates.md)
-
+* @ref:[Operating, Managing, Observability](additional/operations.md)
-You need to enable @ref:[serialization](serialization.md) for your actor messages.
+ 
-@ref:[Serialization with Jackson](serialization-jackson.md) is a good choice in many cases and our
+Enable @ref:[serialization](serialization.md) to send events between ActorSystems and systems
-recommendation if you don't have other preference.
+external to the Cluster. @ref:[Serialization with Jackson](serialization-jackson.md) is a good choice in many cases, and our
 recommendation if you don't have other preferences or constraints.
 ## Dependency
-To use Akka Cluster, you must add the following dependency in your project:
+To use Akka Cluster add the following dependency in your project:
@@dependency[sbt,Maven,Gradle] {
  group="com.typesafe.akka"
@ -20,135 +24,28 @@ To use Akka Cluster, you must add the following dependency in your project:
  version="$akka.version$"
 }
-## Sample project
+## Cluster samples
-You can look at the
+To see what using Akka Cluster looks like in practice, see the
@java[@extref[Cluster example project](samples:akka-samples-cluster-java)]
-@scala[@extref[Cluster example project](samples:akka-samples-cluster-scala)]
+@scala[@extref[Cluster example project](samples:akka-samples-cluster-scala)].
-to see what this looks like in practice.
+This project contains samples illustrating different features, such as
 subscribing to cluster membership events, sending messages to actors running on nodes in the cluster,
 and using Cluster aware routers.
 The easiest way to run this example yourself is to try the
@scala[@extref[Akka Cluster Sample with Scala](samples:akka-samples-cluster-scala)]@java[@extref[Akka Cluster Sample with Java](samples:akka-samples-cluster-java)].
 It contains instructions on how to run the `SimpleClusterApp`.
 ## When and where to use Akka Cluster
 See [Choosing Akka Cluster](typed/choosing-cluster.md#when-and-where-to-use-akka-cluster).
-An architectural choice you have to make is if you are going to use a microservices architecture or
+## Cluster API Extension
 a traditional distributed application. This choice will influence how you should use Akka Cluster.
 ### Microservices
 Microservices has many attractive properties, such as the independent nature of microservices allows for
 multiple smaller and more focused teams that can deliver new functionality more frequently and can
 respond quicker to business opportunities. Reactive Microservices should be isolated, autonomous, and have
 a single responsibility as identified by Jonas Bonér in the book
 [Reactive Microsystems: The Evolution of Microservices at Scale](https://info.lightbend.com/ebook-reactive-microservices-the-evolution-of-microservices-at-scale-register.html).
 In a microservices architecture, you should consider communication within a service and between services.
 In general we recommend against using Akka Cluster and actor messaging between _different_ services because that
 would result in a too tight code coupling between the services and difficulties deploying these independent of
 each other, which is one of the main reasons for using a microservices architecture.
 See the discussion on
@scala[[Internal and External Communication](https://www.lagomframework.com/documentation/current/scala/InternalAndExternalCommunication.html)]
@java[[Internal and External Communication](https://www.lagomframework.com/documentation/current/java/InternalAndExternalCommunication.html)]
 in the docs of the [Lagom Framework](https://www.lagomframework.com) (where each microservice is an Akka Cluster)
 for some background on this.
 Nodes of a single service (collectively called a cluster) require less decoupling. They share the same code and
 are deployed together, as a set, by a single team or individual. There might be two versions running concurrently
 during a rolling deployment, but deployment of the entire set has a single point of control. For this reason,
 intra-service communication can take advantage of Akka Cluster, failure management and actor messaging, which
 is convenient to use and has great performance.
 Between different services [Akka HTTP](https://doc.akka.io/docs/akka-http/current) or
 [Akka gRPC](https://doc.akka.io/docs/akka-grpc/current/) can be used for synchronous (yet non-blocking)
 communication and [Akka Streams Kafka](https://doc.akka.io/docs/akka-stream-kafka/current/home.html) or other
 [Alpakka](https://doc.akka.io/docs/alpakka/current/) connectors for integration asynchronous communication.
 All those communication mechanisms work well with streaming of messages with end-to-end back-pressure, and the
 synchronous communication tools can also be used for single request response interactions. It is also important
 to note that when using these tools both sides of the communication do not have to be implemented with Akka,
 nor does the programming language matter.
 ### Traditional distributed application
 We acknowledge that microservices also introduce many new challenges and it's not the only way to
 build applications. A traditional distributed application may have less complexity and work well in many cases.
 For example for a small startup, with a single team, building an application where time to market is everything.
 Akka Cluster can efficiently be used for building such distributed application.
 In this case, you have a single deployment unit, built from a single code base (or using traditional binary
 dependency management to modularize) but deployed across many nodes using a single cluster.
 Tighter coupling is OK, because there is a central point of deployment and control. In some cases, nodes may
 have specialized runtime roles which means that the cluster is not totally homogenous (e.g., "front-end" and
 "back-end" nodes, or dedicated master/worker nodes) but if these are run from the same built artifacts this
 is just a runtime behavior and doesn't cause the same kind of problems you might get from tight coupling of
 totally separate artifacts.
 A tightly coupled distributed application has served the industry and many Akka users well for years and is
 still a valid choice.
 ### Distributed monolith
 There is also an anti-pattern that is sometimes called "distributed monolith". You have multiple services
 that are built and deployed independently from each other, but they have a tight coupling that makes this
 very risky, such as a shared cluster, shared code and dependencies for service API calls, or a shared
 database schema. There is a false sense of autonomy because of the physical separation of the code and
 deployment units, but you are likely to encounter problems because of changes in the implementation of
 one service leaking into the behavior of others. See Ben Christensen's
 [Don’t Build a Distributed Monolith](https://www.microservices.com/talks/dont-build-a-distributed-monolith/).
 Organizations that find themselves in this situation often react by trying to centrally coordinate deployment
 of multiple services, at which point you have lost the principal benefit of microservices while taking on
 the costs. You are in a halfway state with things that aren't really separable being built and deployed
 in a separate way. Some people do this, and some manage to make it work, but it's not something we would
 recommend and it needs to be carefully managed.
 ## A Simple Cluster Example
 The following configuration enables the `Cluster` extension to be used.
 It joins the cluster and an actor subscribes to cluster membership events and logs them.
 The `application.conf` configuration looks like this:
 ```
 akka {
  actor {
    provider = "cluster"
  }
  remote.artery {
    canonical {
      hostname = "127.0.0.1"
      port = 2551
    }
  }
  cluster {
    seed-nodes = [
      "akka://ClusterSystem@127.0.0.1:2551",
      "akka://ClusterSystem@127.0.0.1:2552"]
    # auto downing is NOT safe for production deployments.
    # you may want to use it during development, read more about it in the docs.
    #
    # auto-down-unreachable-after = 10s
  }
 }
 ```
 To enable cluster capabilities in your Akka project you should, at a minimum, add the @ref:[Remoting](remoting-artery.md)
 settings, but with `cluster`.
 The `akka.cluster.seed-nodes` should normally also be added to your `application.conf` file.
@@@ note
 If you are running Akka in a Docker container or the nodes for some other reason have separate internal and
 external ip addresses you must configure remoting according to @ref:[Akka behind NAT or in a Docker container](remoting-artery.md#remote-configuration-nat-artery)
@@@
 The seed nodes are configured contact points for initial, automatic, join of the cluster.
 Note that if you are going to start the nodes on different machines you need to specify the
 ip-addresses or host names of the machines in `application.conf` instead of `127.0.0.1`
 An actor that uses the cluster extension may look like this:
 Scala
@ -157,80 +54,44 @@ Scala
 Java
 :  @@snip [SimpleClusterListener.java](/akka-docs/src/test/java/jdocs/cluster/SimpleClusterListener.java) { type=java }
 And the minimum configuration required is to set a host/port for remoting and the `akka.actor.provider = "cluster"`.
@@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #config-seeds }
 The actor registers itself as subscriber of certain cluster events. It receives events corresponding to the current state
 of the cluster when the subscription starts and then it receives events for changes that happen in the cluster.
-The easiest way to run this example yourself is to try the
+## Cluster Membership API
@scala[@extref[Akka Cluster Sample with Scala](samples:akka-samples-cluster-scala)]@java[@extref[Akka Cluster Sample with Java](samples:akka-samples-cluster-java)].
 It contains instructions on how to run the `SimpleClusterApp`.
-## Joining to Seed Nodes
+This section shows the basic usage of the membership API. For the in-depth descriptions on joining, joining to seed nodes, downing and leaving of any node in the cluster please see the full 
@ref:[Cluster Membership API](typed/cluster.md#cluster-membership-api) documentation.
-@@@ note
+If not using configuration to specify [seed nodes to join](#joining-to-seed-nodes), joining the cluster can be done programmatically: 
  When starting clusters on cloud systems such as Kubernetes, AWS, Google Cloud, Azure, Mesos or others which maintain 
  DNS or other ways of discovering nodes, you may want to use the automatic joining process implemented by the open source
  [Akka Cluster Bootstrap](https://doc.akka.io/docs/akka-management/current/bootstrap/index.html) module.
@@@
-### Joining configured seed nodes
+Scala
 :  @@snip [SimpleClusterListener2.scala](/akka-docs/src/test/scala/docs/cluster/SimpleClusterListener2.scala) { #join }
-You may decide if joining to the cluster should be done manually or automatically
+Java
-to configured initial contact points, so-called seed nodes. After the joining process
+:  @@snip [SimpleClusterListener2.java](/akka-docs/src/test/java/jdocs/cluster/SimpleClusterListener2.java) { #join }
 the seed nodes are not special and they participate in the cluster in exactly the same
 way as other nodes.
 When a new node is started it sends a message to all seed nodes and then sends join command to the one that
 answers first. If no one of the seed nodes replied (might not be started yet)
 it retries this procedure until successful or shutdown.
-You define the seed nodes in the [configuration](#cluster-configuration) file (application.conf):
+Leaving the cluster and downing a node are similar, for example:
 Scala
 :  @@snip [ClusterDocSpec.scala](/akka-docs/src/test/scala/docs/cluster/ClusterDocSpec.scala) { #leave }
-```
+Java
-akka.cluster.seed-nodes = [
+:  @@snip [ClusterDocTest.java](/akka-docs/src/test/java/jdocs/cluster/ClusterDocTest.java) { #leave }
-  "akka://ClusterSystem@host1:2552",
+ 
-  "akka://ClusterSystem@host2:2552"]
+  
-```
+### Joining to Seed Nodes
-This can also be defined as Java system properties when starting the JVM using the following syntax:
+Joining to initial cluster contact points `akka.cluster.seed-nodes` can be done manually, with @ref:[configuration](typed/cluster.md#joining-configured-seed-nodes), 
 [programatically](#programatically-joining-to-seed-nodes-with-joinseednodes), or @ref:[automatically with Cluster Bootstrap](typed/cluster.md#joining-automatically-to-seed-nodes-with-cluster-bootstrap).
 #### Programatically joining to seed nodes with `joinSeedNodes`
-```
+@@include[cluster.md](includes/cluster.md) { #join-seeds-programmatic }
 -Dakka.cluster.seed-nodes.0=akka://ClusterSystem@host1:2552
 -Dakka.cluster.seed-nodes.1=akka://ClusterSystem@host2:2552
 ```
 The seed nodes can be started in any order and it is not necessary to have all
 seed nodes running, but the node configured as the **first element** in the `seed-nodes`
 configuration list must be started when initially starting a cluster, otherwise the
 other seed-nodes will not become initialized and no other node can join the cluster.
 The reason for the special first seed node is to avoid forming separated islands when
 starting from an empty cluster.
 It is quickest to start all configured seed nodes at the same time (order doesn't matter),
 otherwise it can take up to the configured `seed-node-timeout` until the nodes
 can join.
 Once more than two seed nodes have been started it is no problem to shut down the first
 seed node. If the first seed node is restarted, it will first try to join the other
 seed nodes in the existing cluster. Note that if you stop all seed nodes at the same time
 and restart them with the same `seed-nodes` configuration they will join themselves and
 form a new cluster instead of joining remaining nodes of the existing cluster. That is
 likely not desired and should be avoided by listing several nodes as seed nodes for redundancy
 and don't stop all of them at the same time.
 ### Automatically joining to seed nodes with Cluster Bootstrap
 Instead of manually configuring seed nodes, which is useful in development or statically assigned node IPs, you may want
 to automate the discovery of seed nodes using your cloud providers or cluster orchestrator, or some other form of service
 discovery (such as managed DNS). The open source Akka Management library includes the
 [Cluster Bootstrap](https://doc.akka.io/docs/akka-management/current/bootstrap/index.html) module which handles
 just that. Please refer to its documentation for more details.
 ### Programatically joining to seed nodes with `joinSeedNodes`
 You may also use @scala[`Cluster(system).joinSeedNodes`]@java[`Cluster.get(system).joinSeedNodes`] to join programmatically,
 which is attractive when dynamically discovering other nodes at startup by using some external tool or API.
 When using `joinSeedNodes` you should not include the node itself except for the node that is
 supposed to be the first seed node, and that should be placed first in the parameter to
 `joinSeedNodes`.
 Scala
 :  @@snip [ClusterDocSpec.scala](/akka-docs/src/test/scala/docs/cluster/ClusterDocSpec.scala) { #join-seed-nodes }
@ -238,167 +99,7 @@ Scala
 Java
 :  @@snip [ClusterDocTest.java](/akka-docs/src/test/java/jdocs/cluster/ClusterDocTest.java) { #join-seed-nodes-imports #join-seed-nodes }
-Unsuccessful attempts to contact seed nodes are automatically retried after the time period defined in
+For more information see @ref[tuning joins](typed/cluster.md#tuning-joins)
 configuration property `seed-node-timeout`. Unsuccessful attempt to join a specific seed node is
 automatically retried after the configured `retry-unsuccessful-join-after`. Retrying means that it
 tries to contact all seed nodes and then joins the node that answers first. The first node in the list
 of seed nodes will join itself if it cannot contact any of the other seed nodes within the
 configured `seed-node-timeout`.
 The joining of given seed nodes will by default be retried indefinitely until
 a successful join. That process can be aborted if unsuccessful by configuring a
 timeout. When aborted it will run @ref:[Coordinated Shutdown](actors.md#coordinated-shutdown),
 which by default will terminate the ActorSystem. CoordinatedShutdown can also be configured to exit
 the JVM. It is useful to define this timeout if the `seed-nodes` are assembled
 dynamically and a restart with new seed-nodes should be tried after unsuccessful
 attempts.
 ```
 akka.cluster.shutdown-after-unsuccessful-join-seed-nodes = 20s
 akka.coordinated-shutdown.terminate-actor-system = on
 ```
 If you don't configure seed nodes or use `joinSeedNodes` you need to join the cluster manually, which can be performed by using [JMX](#cluster-jmx) or [HTTP](#cluster-http).
 You can join to any node in the cluster. It does not have to be configured as a seed node.
 Note that you can only join to an existing cluster member, which means that for bootstrapping some
 node must join itself,and then the following nodes could join them to make up a cluster.
 An actor system can only join a cluster once. Additional attempts will be ignored.
 When it has successfully joined it must be restarted to be able to join another
 cluster or to join the same cluster again. It can use the same host name and port
 after the restart, when it come up as new incarnation of existing member in the cluster,
 trying to join in, then the existing one will be removed from the cluster and then it will
 be allowed to join.
@@@ note
 The name of the `ActorSystem` must be the same for all members of a cluster. The name is given when you start the `ActorSystem`.
@@@
 <a id="automatic-vs-manual-downing"></a>
 ## Downing
 When a member is considered by the failure detector to be unreachable the
 leader is not allowed to perform its duties, such as changing status of
 new joining members to 'Up'. The node must first become reachable again, or the
 status of the unreachable member must be changed to 'Down'. Changing status to 'Down'
 can be performed automatically or manually. By default it must be done manually, using
 [JMX](#cluster-jmx) or [HTTP](#cluster-http).
 It can also be performed programmatically with @scala[`Cluster(system).down(address)`]@java[`Cluster.get(system).down(address)`].
 If a node is still running and sees its self as Down it will shutdown. @ref:[Coordinated Shutdown](actors.md#coordinated-shutdown) will automatically
 run if `run-coordinated-shutdown-when-down` is set to `on` (the default) however the node will not try
 and leave the cluster gracefully so sharding and singleton migration will not occur.
 A pre-packaged solution for the downing problem is provided by
 [Split Brain Resolver](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html),
 which is part of the [Lightbend Reactive Platform](http://www.lightbend.com/platform).
 If you don’t use RP, you should anyway carefully read the [documentation](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html)
 of the Split Brain Resolver and make sure that the solution you are using handles the concerns
 described there.
 ### Auto-downing (DO NOT USE)
 There is an automatic downing feature that you should not use in production. For testing purpose you can enable it with configuration:
 ```
 akka.cluster.auto-down-unreachable-after = 120s
 ```
 This means that the cluster leader member will change the `unreachable` node
 status to `down` automatically after the configured time of unreachability.
 This is a naïve approach to remove unreachable nodes from the cluster membership.
 It can be useful during development but in a production environment it will eventually breakdown the cluster. When a network partition occurs, both sides of the
 partition will see the other side as unreachable and remove it from the cluster.
 This results in the formation of two separate, disconnected, clusters 
 (known as *Split Brain*).
 This behaviour is not limited to network partitions. It can also occur if a node
 in the cluster is overloaded, or experiences a long GC pause.
@@@ warning
 We recommend against using the auto-down feature of Akka Cluster in production. It
 has multiple undesirable consequences for production systems.
 If you are using @ref:[Cluster Singleton](cluster-singleton.md) or
@ref:[Cluster Sharding](cluster-sharding.md) it can break the contract provided by 
 those features. Both provide a guarantee that an actor will be unique in a cluster.
 With the auto-down feature enabled, it is possible for multiple independent clusters
 to form (*Split Brain*). When this happens the guaranteed uniqueness will no
 longer be true resulting in undesirable behaviour in the system.
 This is even more severe when @ref:[Akka Persistence](persistence.md) is used in
 conjunction with Cluster Sharding. In this case, the lack of unique actors can 
 cause multiple actors to write to the same journal. Akka Persistence operates on a
 single writer principle. Having multiple writers will corrupt the journal
 and make it unusable.
 Finally, even if you don't use features such as Persistence, Sharding, or Singletons, 
 auto-downing can lead the system to form multiple small clusters. These small
 clusters will be independent from each other. They will be unable to communicate
 and as a result you may experience performance degradation. Once this condition
 occurs, it will require manual intervention in order to reform the cluster.
 Because of these issues, auto-downing should **never** be used in a production environment.
@@@
 ## Leaving
 There are two ways to remove a member from the cluster.
 You can stop the actor system (or the JVM process). It will be detected
 as unreachable and removed after the automatic or manual downing as described
 above.
 A more graceful exit can be performed if you tell the cluster that a node shall leave.
 This can be performed using [JMX](#cluster-jmx) or [HTTP](#cluster-http).
 It can also be performed programmatically with:
 Scala
 :  @@snip [ClusterDocSpec.scala](/akka-docs/src/test/scala/docs/cluster/ClusterDocSpec.scala) { #leave }
 Java
 :  @@snip [ClusterDocTest.java](/akka-docs/src/test/java/jdocs/cluster/ClusterDocTest.java) { #leave }
 Note that this command can be issued to any member in the cluster, not necessarily the
 one that is leaving.
 The @ref:[Coordinated Shutdown](actors.md#coordinated-shutdown) will automatically run when the cluster node sees itself as
 `Exiting`, i.e. leaving from another node will trigger the shutdown process on the leaving node.
 Tasks for graceful leaving of cluster including graceful shutdown of Cluster Singletons and
 Cluster Sharding are added automatically when Akka Cluster is used, i.e. running the shutdown
 process will also trigger the graceful leaving if it's not already in progress.
 Normally this is handled automatically, but in case of network failures during this process it might still
 be necessary to set the node’s status to `Down` in order to complete the removal.
 <a id="weakly-up"></a>
 ## WeaklyUp Members
 If a node is `unreachable` then gossip convergence is not possible and therefore any
 `leader` actions are also not possible. However, we still might want new nodes to join
 the cluster in this scenario.
 `Joining` members will be promoted to `WeaklyUp` and become part of the cluster if
 convergence can't be reached. Once gossip convergence is reached, the leader will move `WeaklyUp`
 members to `Up`.
 This feature is enabled by default, but it can be disabled with configuration option:
 ```
 akka.cluster.allow-weakly-up-members = off
 ```
 You can subscribe to the `WeaklyUp` membership event to make use of the members that are
 in this state, but you should be aware of that members on the other side of a network partition
 have no knowledge about the existence of the new members. You should for example not count
 `WeaklyUp` members in quorum decisions.
 <a id="cluster-subscriber"></a>
 ## Subscribe to Cluster Events
@ -451,26 +152,6 @@ Scala
 Java
 :  @@snip [SimpleClusterListener.java](/akka-docs/src/test/java/jdocs/cluster/SimpleClusterListener.java) { #subscribe }
 The events to track the life-cycle of members are:
 * `ClusterEvent.MemberJoined` - A new member has joined the cluster and its status has been changed to `Joining`
 * `ClusterEvent.MemberUp` - A new member has joined the cluster and its status has been changed to `Up`
 * `ClusterEvent.MemberExited` - A member is leaving the cluster and its status has been changed to `Exiting`
 Note that the node might already have been shutdown when this event is published on another node.
 * `ClusterEvent.MemberRemoved` - Member completely removed from the cluster.
 * `ClusterEvent.UnreachableMember` - A member is considered as unreachable, detected by the failure detector
 of at least one other node.
 * `ClusterEvent.ReachableMember` - A member is considered as reachable again, after having been unreachable.
 All nodes that previously detected it as unreachable has detected it as reachable again.
 There are more types of change events, consult the API documentation
 of classes that extends `akka.cluster.ClusterEvent.ClusterDomainEvent`
 for details about the events.
 Instead of subscribing to cluster events it can sometimes be convenient to only get the full membership state with
@scala[`Cluster(system).state`]@java[`Cluster.get(system).state()`]. Note that this state is not necessarily in sync with the events published to a
 cluster subscription.
 ### Worker Dial-in Example
 Let's take a look at an example that illustrates how workers, here named *backend*,
@ -520,17 +201,10 @@ unreachable cluster node has been downed and removed.
 The easiest way to run **Worker Dial-in Example** example yourself is to try the
@scala[@extref[Akka Cluster Sample with Scala](samples:akka-samples-cluster-scala)]@java[@extref[Akka Cluster Sample with Java](samples:akka-samples-cluster-java)].
 It contains instructions on how to run the **Worker Dial-in Example** sample.
- 
+
 ## Node Roles
-Not all nodes of a cluster need to perform the same function: there might be one sub-set which runs the web front-end,
+See @ref:[Cluster Node Roles](typed/cluster.md#node-roles)
 one which runs the data access layer and one for the number-crunching. Deployment of actors—for example by cluster-aware
 routers—can take node roles into account to achieve this distribution of responsibilities.
 The roles of a node is defined in the configuration property named `akka.cluster.roles`
 and it is typically defined in the start script as a system property or environment variable.
 The roles of the nodes is part of the membership information in `MemberEvent` that you can subscribe to.
 <a id="min-members"></a>
 ## How To Startup when Cluster Size Reached
@ -555,9 +229,11 @@ akka.cluster.role {
 }
 ```
-You can start the actors in a `registerOnMemberUp` callback, which will
+## How To Startup when Member is Up
-be invoked when the current member status is changed to 'Up', i.e. the cluster
+
-has at least the defined number of members.
+You can start actors or trigger any functions using the `registerOnMemberUp` callback, which will
 be invoked when the current member status is changed to 'Up'. This can additionally be used with 
 `akka.cluster.min-nr-of-members` optional configuration to defer an action until the cluster has reached a certain size.
 Scala
 :  @@snip [FactorialFrontend.scala](/akka-docs/src/test/scala/docs/cluster/FactorialFrontend.scala) { #registerOnUp }
@ -565,8 +241,6 @@ Scala
 Java
 :  @@snip [FactorialFrontendMain.java](/akka-docs/src/test/java/jdocs/cluster/FactorialFrontendMain.java) { #registerOnUp }
 This callback can be used for other things than starting actors.
 ## How To Cleanup when Member is Removed
 You can do some clean up in a `registerOnMemberRemoved` callback, which will
@ -585,139 +259,55 @@ down when you installing, and depending on the race is not healthy.
 ## Higher level Cluster tools
-### Cluster Singleton
+@@include[cluster.md](includes/cluster.md) { #cluster-singleton }
-
+See @ref:[Cluster Singleton](cluster-singleton.md).
 For some use cases it is convenient and sometimes also mandatory to ensure that
 you have exactly one actor of a certain type running somewhere in the cluster.
 This can be implemented by subscribing to member events, but there are several corner
 cases to consider. Therefore, this specific use case is covered by the
@ref:[Cluster Singleton](cluster-singleton.md).
 ### Cluster Sharding
 Distributes actors across several nodes in the cluster and supports interaction
 with the actors using their logical identifier, but without having to care about
 their physical location in the cluster.
@@include[cluster.md](includes/cluster.md) { #cluster-sharding }
 See @ref:[Cluster Sharding](cluster-sharding.md).
@@include[cluster.md](includes/cluster.md) { #cluster-ddata } 
 See @ref:[Distributed Data](distributed-data.md).
-### Distributed Publish Subscribe
+@@include[cluster.md](includes/cluster.md) { #cluster-pubsub }
 See @ref:[Cluster Distributed Publish Subscribe](distributed-pub-sub.md).
-Publish-subscribe messaging between actors in the cluster, and point-to-point messaging
+### Cluster Aware Routers
 using the logical path of the actors, i.e. the sender does not have to know on which
 node the destination actor is running.
 See @ref:[Distributed Publish Subscribe in Cluster](distributed-pub-sub.md).
 All routers can be made aware of member nodes in the cluster, i.e.
 deploying new routees or looking up routees on nodes in the cluster.
 When a node becomes unreachable or leaves the cluster the routees of that node are
 automatically unregistered from the router. When new nodes join the cluster, additional
 routees are added to the router, according to the configuration.
 See @ref:[Cluster Aware Routers](cluster-routing.md) and @ref:[Routers](routing.md).
@@include[cluster.md](includes/cluster.md) { #cluster-multidc } 
 See @ref:[Cluster Multi-DC](cluster-dc.md).
 ### Cluster Client
 Communication from an actor system that is not part of the cluster to actors running
 somewhere in the cluster. The client does not have to know on which node the destination
 actor is running.
-See @ref:[Cluster Client](cluster-client.md).
+See @ref:[Cluster Client](cluster-client.md). 
-
+ 
 ### Distributed Data
 *Akka Distributed Data* is useful when you need to share data between nodes in an
 Akka Cluster. The data is accessed with an actor providing a key-value store like API.
 See @ref:[Distributed Data](distributed-data.md).
 ### Cluster Aware Routers
 All @ref:[routers](routing.md) can be made aware of member nodes in the cluster, i.e.
 deploying new routees or looking up routees on nodes in the cluster.
 When a node becomes unreachable or leaves the cluster the routees of that node are
 automatically unregistered from the router. When new nodes join the cluster, additional
 routees are added to the router, according to the configuration.
 See @ref:[Cluster Aware Routers](cluster-routing.md).
 ### Cluster Metrics
 The member nodes of the cluster can collect system health metrics and publish that to other cluster nodes
 and to the registered subscribers on the system event bus.
-See @ref:[Cluster Metrics](cluster-metrics.md).
+See @ref:[Cluster Metrics](cluster-metrics.md). 
-
+    
 ## Failure Detector
 In a cluster each node is monitored by a few (default maximum 5) other nodes, and when
 any of these detects the node as `unreachable` that information will spread to
 the rest of the cluster through the gossip. In other words, only one node needs to
 mark a node `unreachable` to have the rest of the cluster mark that node `unreachable`.
 The failure detector will also detect if the node becomes `reachable` again. When
 all nodes that monitored the `unreachable` node detects it as `reachable` again
 the cluster, after gossip dissemination, will consider it as `reachable`.
 If system messages cannot be delivered to a node it will be quarantined and then it
 cannot come back from `unreachable`. This can happen if the there are too many
 unacknowledged system messages (e.g. watch, Terminated, remote actor deployment,
 failures of actors supervised by remote parent). Then the node needs to be moved
 to the `down` or `removed` states and the actor system of the quarantined node
 must be restarted before it can join the cluster again.
 The nodes in the cluster monitor each other by sending heartbeats to detect if a node is
-unreachable from the rest of the cluster. The heartbeat arrival times is interpreted
+unreachable from the rest of the cluster. Please see:
 by an implementation of
 [The Phi Accrual Failure Detector](http://www.jaist.ac.jp/~defago/files/pdf/IS_RR_2004_010.pdf).
 The suspicion level of failure is given by a value called *phi*.
 The basic idea of the phi failure detector is to express the value of *phi* on a scale that
 is dynamically adjusted to reflect current network conditions.
 The value of *phi* is calculated as:
 ```
 phi = -log10(1 - F(timeSinceLastHeartbeat))
 ```
 where F is the cumulative distribution function of a normal distribution with mean
 and standard deviation estimated from historical heartbeat inter-arrival times.
 In the [configuration](#cluster-configuration) you can adjust the `akka.cluster.failure-detector.threshold`
 to define when a *phi* value is considered to be a failure.
 A low `threshold` is prone to generate many false positives but ensures
 a quick detection in the event of a real crash. Conversely, a high `threshold`
 generates fewer mistakes but needs more time to detect actual crashes. The
 default `threshold` is 8 and is appropriate for most situations. However in
 cloud environments, such as Amazon EC2, the value could be increased to 12 in
 order to account for network issues that sometimes occur on such platforms.
 The following chart illustrates how *phi* increase with increasing time since the
 previous heartbeat.
 ![phi1.png](./images/phi1.png)
 Phi is calculated from the mean and standard deviation of historical
 inter arrival times. The previous chart is an example for standard deviation
 of 200 ms. If the heartbeats arrive with less deviation the curve becomes steeper,
 i.e. it is possible to determine failure more quickly. The curve looks like this for
 a standard deviation of 100 ms.
 ![phi2.png](./images/phi2.png)
 To be able to survive sudden abnormalities, such as garbage collection pauses and
 transient network failures the failure detector is configured with a margin,
 `akka.cluster.failure-detector.acceptable-heartbeat-pause`. You may want to
 adjust the [configuration](#cluster-configuration) of this depending on your environment.
 This is how the curve looks like for `acceptable-heartbeat-pause` configured to
 3 seconds.
 ![phi3.png](./images/phi3.png)
 Death watch uses the cluster failure detector for nodes in the cluster, i.e. it detects
 network failures and JVM crashes, in addition to graceful termination of watched
 actor. Death watch generates the `Terminated` message to the watching actor when the
 unreachable cluster node has been downed and removed.
 If you encounter suspicious false positives when the system is under load you should
 define a separate dispatcher for the cluster actors as described in [Cluster Dispatcher](#cluster-dispatcher).
 * @ref:[Failure Detector specification](typed/cluster-specification.md#failure-detector)
 * @ref:[Phi Accrual Failure Detector](typed/failure-detector.md) implementation
 * [Using the Failure Detector](typed/cluster.md#using-the-failure-detector) 
 ## How to Test
@@@ div { .group-scala }
@ -728,7 +318,7 @@ Set up your project according to the instructions in @ref:[Multi Node Testing](m
 add the `sbt-multi-jvm` plugin and the dependency to `akka-multi-node-testkit`.
 First, as described in @ref:[Multi Node Testing](multi-node-testing.md), we need some scaffolding to configure the `MultiNodeSpec`.
-Define the participating roles and their [configuration](#cluster-configuration) in an object extending `MultiNodeConfig`:
+Define the participating @ref:[roles](typed/cluster.md#node-roles) and their [configuration](#configuration) in an object extending `MultiNodeConfig`:
@@snip [StatsSampleSpec.scala](/akka-cluster-metrics/src/multi-jvm/scala/akka/cluster/metrics/sample/StatsSampleSpec.scala) { #MultiNodeConfig }
@ -783,29 +373,9 @@ Go to the corresponding Scala version of this page for details.
 ## Management
-<a id="cluster-http"></a>
+There are several management tools for the cluster. Please refer to the
-### HTTP
+@ref:[Cluster Management](additional/operations.md#cluster-management) for more information.
-
+ 
 Information and management of the cluster is available with a HTTP API.
 See documentation of [Akka Management](http://developer.lightbend.com/docs/akka-management/current/).
 <a id="cluster-jmx"></a>
 ### JMX
 Information and management of the cluster is available as JMX MBeans with the root name `akka.Cluster`.
 The JMX information can be displayed with an ordinary JMX console such as JConsole or JVisualVM.
 From JMX you can:
 * see what members that are part of the cluster
 * see status of this node
 * see roles of each member
 * join this node to another node in cluster
 * mark any node in the cluster as down
 * tell any node in the cluster to leave
 Member nodes are identified by their address, in format *akka.**protocol**://**actor-system-name**@**hostname**:**port***.
 <a id="cluster-command-line"></a>
 ### Command Line
@ -850,52 +420,8 @@ as described in [Monitoring and Management Using JMX Technology](http://docs.ora
 Make sure you understand the security implications of enabling remote monitoring and management.
 <a id="cluster-configuration"></a>
 ## Configuration
-There are several configuration properties for the cluster. We refer to the
+There are several @ref:[configuration](typed/cluster.md#configuration) properties for the cluster,
-@ref:[reference configuration](general/configuration.md#config-akka-cluster) for more information.
+and the full @ref:[reference configuration](general/configuration.md#config-akka-cluster) for complete information. 
 ### Cluster Info Logging
 You can silence the logging of cluster events at info level with configuration property:
 ```
 akka.cluster.log-info = off
 ```
 You can enable verbose logging of cluster events at info level, e.g. for temporary troubleshooting, with configuration property:
 ```
 akka.cluster.log-info-verbose = on
 ```
 ### Cluster Dispatcher
 Under the hood the cluster extension is implemented with actors. To protect them against
 disturbance from user actors they are by default run on the internal dispatcher configured
 under `akka.actor.internal-dispatcher`. The cluster actors can potentially be isolated even
 further onto their own dispatcher using the setting `akka.cluster.use-dispatcher`.
 ### Configuration Compatibility Check
 Creating a cluster is about deploying two or more nodes and make then behave as if they were one single application. Therefore it's extremely important that all nodes in a cluster are configured with compatible settings. 
 The Configuration Compatibility Check feature ensures that all nodes in a cluster have a compatible configuration. Whenever a new node is joining an existing cluster, a subset of its configuration settings (only those that are required to be checked) is sent to the nodes in the cluster for verification. Once the configuration is checked on the cluster side, the cluster sends back its own set of required configuration settings. The joining node will then verify if it's compliant with the cluster configuration. The joining node will only proceed if all checks pass, on both sides.   
 New custom checkers can be added by extending `akka.cluster.JoinConfigCompatChecker` and including them in the configuration. Each checker must be associated with a unique key:
 ```
 akka.cluster.configuration-compatibility-check.checkers {
  my-custom-config = "com.company.MyCustomJoinConfigCompatChecker"
 }
 ``` 
@@@ note
 Configuration Compatibility Check is enabled by default, but can be disabled by setting `akka.cluster.configuration-compatibility-check.enforce-on-join = off`. This is specially useful when performing rolling updates. Obviously this should only be done if a complete cluster shutdown isn't an option. A cluster with nodes with different configuration settings may lead to data loss or data corruption. 
 This setting should only be disabled on the joining nodes. The checks are always performed on both sides, and warnings are logged. In case of incompatibilities, it is the responsibility of the joining node to decide if the process should be interrupted or not.  
 If you are performing a rolling update on cluster using Akka 2.5.9 or prior (thus, not supporting this feature), the checks will not be performed because the running cluster has no means to verify the configuration sent by the joining node, nor to send back its own configuration.  
@@@ 
--- a/akka-docs/src/main/paradox/common/cluster.md
+++ b/akka-docs/src/main/paradox/common/cluster.md
@ -1,312 +0,0 @@
 # Cluster Specification
@@@ note
 This document describes the design concepts of Akka Cluster.
@@@
 ## Intro
 Akka Cluster provides a fault-tolerant decentralized peer-to-peer based cluster
 [membership](#membership) service with no single point of failure or single point of bottleneck.
 It does this using [gossip](#gossip) protocols and an automatic [failure detector](#failure-detector).
 Akka cluster allows for building distributed applications, where one application or service spans multiple nodes
 (in practice multiple `ActorSystem`s). See also the discussion in
@ref:[When and where to use Akka Cluster](../cluster-usage.md#when-and-where-to-use-akka-cluster).
 ## Terms
 **node**
 : A logical member of a cluster. There could be multiple nodes on a physical
 machine. Defined by a *hostname:port:uid* tuple.
 **cluster**
 : A set of nodes joined together through the [membership](#membership) service.
 **leader**
 : A single node in the cluster that acts as the leader. Managing cluster convergence
 and membership state transitions.
 ## Membership
 A cluster is made up of a set of member nodes. The identifier for each node is a
 `hostname:port:uid` tuple. An Akka application can be distributed over a cluster with
 each node hosting some part of the application. Cluster membership and the actors running
 on that node of the application are decoupled. A node could be a member of a
 cluster without hosting any actors. Joining a cluster is initiated
 by issuing a `Join` command to one of the nodes in the cluster to join.
 The node identifier internally also contains a UID that uniquely identifies this
 actor system instance at that `hostname:port`. Akka uses the UID to be able to
 reliably trigger remote death watch. This means that the same actor system can never
 join a cluster again once it's been removed from that cluster. To re-join an actor
 system with the same `hostname:port` to a cluster you have to stop the actor system
 and start a new one with the same `hostname:port` which will then receive a different
 UID.
 The cluster membership state is a specialized [CRDT](http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf), which means that it has a monotonic
 merge function. When concurrent changes occur on different nodes the updates can always be
 merged and converge to the same end result.
 ### Gossip
 The cluster membership used in Akka is based on Amazon's [Dynamo](http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) system and
 particularly the approach taken in Basho's' [Riak](http://basho.com/technology/architecture/) distributed database.
 Cluster membership is communicated using a [Gossip Protocol](http://en.wikipedia.org/wiki/Gossip_protocol), where the current
 state of the cluster is gossiped randomly through the cluster, with preference to
 members that have not seen the latest version.
 #### Vector Clocks
 [Vector clocks](http://en.wikipedia.org/wiki/Vector_clock) are a type of data structure and algorithm for generating a partial
 ordering of events in a distributed system and detecting causality violations.
 We use vector clocks to reconcile and merge differences in cluster state
 during gossiping. A vector clock is a set of (node, counter) pairs. Each update
 to the cluster state has an accompanying update to the vector clock.
 #### Gossip Convergence
 Information about the cluster converges locally at a node at certain points in time.
 This is when a node can prove that the cluster state he is observing has been observed
 by all other nodes in the cluster. Convergence is implemented by passing a set of nodes
 that have seen current state version during gossip. This information is referred to as the
 seen set in the gossip overview. When all nodes are included in the seen set there is
 convergence.
 Gossip convergence cannot occur while any nodes are `unreachable`. The nodes need
 to become `reachable` again, or moved to the `down` and `removed` states
 (see the [Membership Lifecycle](#membership-lifecycle) section below). This only blocks the leader
 from performing its cluster membership management and does not influence the application
 running on top of the cluster. For example this means that during a network partition
 it is not possible to add more nodes to the cluster. The nodes can join, but they
 will not be moved to the `up` state until the partition has healed or the unreachable
 nodes have been downed.
 #### Failure Detector
 The failure detector is responsible for trying to detect if a node is
 `unreachable` from the rest of the cluster. For this we are using an
 implementation of [The Phi Accrual Failure Detector](https://pdfs.semanticscholar.org/11ae/4c0c0d0c36dc177c1fff5eb84fa49aa3e1a8.pdf) by Hayashibara et al.
 An accrual failure detector decouples monitoring and interpretation. That makes
 them applicable to a wider area of scenarios and more adequate to build generic
 failure detection services. The idea is that it is keeping a history of failure
 statistics, calculated from heartbeats received from other nodes, and is
 trying to do educated guesses by taking multiple factors, and how they
 accumulate over time, into account in order to come up with a better guess if a
 specific node is up or down. Rather than only answering "yes" or "no" to the
 question "is the node down?" it returns a `phi` value representing the
 likelihood that the node is down.
 The `threshold` that is the basis for the calculation is configurable by the
 user. A low `threshold` is prone to generate many wrong suspicions but ensures
 a quick detection in the event of a real crash. Conversely, a high `threshold`
 generates fewer mistakes but needs more time to detect actual crashes. The
 default `threshold` is 8 and is appropriate for most situations. However in
 cloud environments, such as Amazon EC2, the value could be increased to 12 in
 order to account for network issues that sometimes occur on such platforms.
 In a cluster each node is monitored by a few (default maximum 5) other nodes, and when
 any of these detects the node as `unreachable` that information will spread to
 the rest of the cluster through the gossip. In other words, only one node needs to
 mark a node `unreachable` to have the rest of the cluster mark that node `unreachable`.
 The nodes to monitor are picked out of neighbors in a hashed ordered node ring.
 This is to increase the likelihood to monitor across racks and data centers, but the order
 is the same on all nodes, which ensures full coverage.
 Heartbeats are sent out every second and every heartbeat is performed in a request/reply
 handshake with the replies used as input to the failure detector.
 The failure detector will also detect if the node becomes `reachable` again. When
 all nodes that monitored the `unreachable` node detects it as `reachable` again
 the cluster, after gossip dissemination, will consider it as `reachable`.
 If system messages cannot be delivered to a node it will be quarantined and then it
 cannot come back from `unreachable`. This can happen if the there are too many
 unacknowledged system messages (e.g. watch, Terminated, remote actor deployment,
 failures of actors supervised by remote parent). Then the node needs to be moved
 to the `down` or `removed` states (see the [Membership Lifecycle](#membership-lifecycle) section below)
 and the actor system must be restarted before it can join the cluster again.
 #### Leader
 After gossip convergence a `leader` for the cluster can be determined. There is no
 `leader` election process, the `leader` can always be recognised deterministically
 by any node whenever there is gossip convergence. The leader is only a role, any node
 can be the leader and it can change between convergence rounds.
 The `leader` is the first node in sorted order that is able to take the leadership role,
 where the preferred member states for a `leader` are `up` and `leaving`
 (see the [Membership Lifecycle](#membership-lifecycle) section below for more  information about member states).
 The role of the `leader` is to shift members in and out of the cluster, changing
 `joining` members to the `up` state or `exiting` members to the `removed`
 state. Currently `leader` actions are only triggered by receiving a new cluster
 state with gossip convergence.
 The `leader` also has the power, if configured so, to "auto-down" a node that
 according to the [Failure Detector](#failure-detector) is considered `unreachable`. This means setting
 the `unreachable` node status to `down` automatically after a configured time
 of unreachability.
 #### Seed Nodes
 The seed nodes are configured contact points for new nodes joining the cluster.
 When a new node is started it sends a message to all seed nodes and then sends
 a join command to the seed node that answers first.
 The seed nodes configuration value does not have any influence on the running
 cluster itself, it is only relevant for new nodes joining the cluster as it
 helps them to find contact points to send the join command to; a new member
 can send this command to any current member of the cluster, not only to the seed nodes.
 #### Gossip Protocol
 A variation of *push-pull gossip* is used to reduce the amount of gossip
 information sent around the cluster. In push-pull gossip a digest is sent
 representing current versions but not actual values; the recipient of the gossip
 can then send back any values for which it has newer versions and also request
 values for which it has outdated versions. Akka uses a single shared state with
 a vector clock for versioning, so the variant of push-pull gossip used in Akka
 makes use of this version to only push the actual state as needed.
 Periodically, the default is every 1 second, each node chooses another random
 node to initiate a round of gossip with. If less than ½ of the nodes resides in the
 seen set (have seen the new state) then the cluster gossips 3 times instead of once
 every second. This adjusted gossip interval is a way to speed up the convergence process
 in the early dissemination phase after a state change.
 The choice of node to gossip with is random but biased towards nodes that might not have seen
 the current state version. During each round of gossip exchange, when convergence is not yet reached, a node
 uses a very high probability (which is configurable) to gossip with another node which is not part of the seen set, i.e. 
 which is likely to have an older version of the state. Otherwise it gossips with any random live node.
 This biased selection is a way to speed up the convergence process in the late dissemination
 phase after a state change.
 For clusters larger than 400 nodes (configurable, and suggested by empirical evidence)
 the 0.8 probability is gradually reduced to avoid overwhelming single stragglers with
 too many concurrent gossip requests. The gossip receiver also has a mechanism to
 protect itself from too many simultaneous gossip messages by dropping messages that
 have been enqueued in the mailbox for too long of a time.
 While the cluster is in a converged state the gossiper only sends a small gossip status message containing the gossip
 version to the chosen node. As soon as there is a change to the cluster (meaning non-convergence)
 then it goes back to biased gossip again.
 The recipient of the gossip state or the gossip status can use the gossip version
 (vector clock) to determine whether:
 1. it has a newer version of the gossip state, in which case it sends that back
 to the gossiper
 2. it has an outdated version of the state, in which case the recipient requests
 the current state from the gossiper by sending back its version of the gossip state
 3. it has conflicting gossip versions, in which case the different versions are merged
 and sent back
 If the recipient and the gossip have the same version then the gossip state is
 not sent or requested.
 The periodic nature of the gossip has a nice batching effect of state changes,
 e.g. joining several nodes quickly after each other to one node will result in only
 one state change to be spread to other members in the cluster.
 The gossip messages are serialized with [protobuf](https://code.google.com/p/protobuf/) and also gzipped to reduce payload
 size.
 ### Membership Lifecycle
 A node begins in the `joining` state. Once all nodes have seen that the new
 node is joining (through gossip convergence) the `leader` will set the member
 state to `up`.
 If a node is leaving the cluster in a safe, expected manner then it switches to
 the `leaving` state. Once the leader sees the convergence on the node in the
 `leaving` state, the leader will then move it to `exiting`.  Once all nodes
 have seen the exiting state (convergence) the `leader` will remove the node
 from the cluster, marking it as `removed`.
 If a node is `unreachable` then gossip convergence is not possible and therefore
 any `leader` actions are also not possible (for instance, allowing a node to
 become a part of the cluster). To be able to move forward the state of the
 `unreachable` nodes must be changed. It must become `reachable` again or marked
 as `down`. If the node is to join the cluster again the actor system must be
 restarted and go through the joining process again. The cluster can, through the
 leader, also *auto-down* a node after a configured time of unreachability. If new
 incarnation of unreachable node tries to rejoin the cluster old incarnation will be 
 marked as `down` and new incarnation can rejoin the cluster without manual intervention. 
@@@ note
 If you have *auto-down* enabled and the failure detector triggers, you
 can over time end up with a lot of single node clusters if you don't put
 measures in place to shut down nodes that have become `unreachable`. This
 follows from the fact that the `unreachable` node will likely see the rest of
 the cluster as `unreachable`, become its own leader and form its own cluster.
@@@
 As mentioned before, if a node is `unreachable` then gossip convergence is not
 possible and therefore any `leader` actions are also not possible. By enabling
 `akka.cluster.allow-weakly-up-members` (enabled by default) it is possible to 
 let new joining nodes be promoted while convergence is not yet reached. These 
 `Joining` nodes will be promoted as `WeaklyUp`. Once gossip convergence is 
 reached, the leader will move `WeaklyUp` members to `Up`.
 Note that members on the other side of a network partition have no knowledge about 
 the existence of the new members. You should for example not count `WeaklyUp` 
 members in quorum decisions.
 #### State Diagram for the Member States (`akka.cluster.allow-weakly-up-members=off`)
 ![member-states.png](../images/member-states.png)
 #### State Diagram for the Member States (`akka.cluster.allow-weakly-up-members=on`)
 ![member-states-weakly-up.png](../images/member-states-weakly-up.png)
 #### Member States
 * **joining** - transient state when joining a cluster
 * **weakly up** - transient state while network split (only if `akka.cluster.allow-weakly-up-members=on`)
 * **up** - normal operating state
 * **leaving** / **exiting** - states during graceful removal
 * **down** - marked as down (no longer part of cluster decisions)
 * **removed** - tombstone state (no longer a member)
 #### User Actions
 * **join** - join a single node to a cluster - can be explicit or automatic on
 startup if a node to join have been specified in the configuration
 * **leave** - tell a node to leave the cluster gracefully
 * **down** - mark a node as down
 #### Leader Actions
 The `leader` has the following duties:
 * shifting members in and out of the cluster
    * joining -> up
    * weakly up -> up *(no convergence is required for this leader action to be performed)*
    * exiting -> removed
 #### Failure Detection and Unreachability
 * **fd*** - the failure detector of one of the monitoring nodes has triggered
 causing the monitored node to be marked as unreachable
 * **unreachable*** - unreachable is not a real member states but more of a flag in addition to the state signaling that the cluster is unable to talk to this node, after being unreachable the failure detector may detect it as reachable again and thereby remove the flag
--- a/akka-docs/src/main/paradox/distributed-data.md
+++ b/akka-docs/src/main/paradox/distributed-data.md
@ -51,12 +51,12 @@ with a specific role. It communicates with other `Replicator` instances with the
 actor using the `Replicator.props`. If it is started as an ordinary actor it is important
 that it is given the same name, started on same path, on all nodes.
-Cluster members with status @ref:[WeaklyUp](cluster-usage.md#weakly-up),
+Cluster members with status @ref:[WeaklyUp](typed/cluster-membership.md#weakly-up),
 will participate in Distributed Data. This means that the data will be replicated to the
-@ref:[WeaklyUp](cluster-usage.md#weakly-up) nodes with the background gossip protocol. Note that it
+@ref:[WeaklyUp](typed/cluster-membership.md#weakly-up) nodes with the background gossip protocol. Note that it
 will not participate in any actions where the consistency mode is to read/write from all
-nodes or the majority of nodes. The @ref:[WeaklyUp](cluster-usage.md#weakly-up) node is not counted
+nodes or the majority of nodes. The @ref:[WeaklyUp](typed/cluster-membership.md#weakly-up) node is not counted
-as part of the cluster. So 3 nodes + 5 @ref:[WeaklyUp](cluster-usage.md#weakly-up) is essentially a
+as part of the cluster. So 3 nodes + 5 @ref:[WeaklyUp](typed/cluster-membership.md#weakly-up) is essentially a
 3 node cluster as far as consistent actions are concerned.
 Below is an example of an actor that schedules tick messages to itself and for each tick
--- a/akka-docs/src/main/paradox/distributed-pub-sub.md
+++ b/akka-docs/src/main/paradox/distributed-pub-sub.md
@ -34,7 +34,7 @@ a few seconds. Changes are only performed in the own part of the registry and th
 changes are versioned. Deltas are disseminated in a scalable way to other nodes with
 a gossip protocol.
-Cluster members with status @ref:[WeaklyUp](cluster-usage.md#weakly-up),
+Cluster members with status @ref:[WeaklyUp](typed/cluster-membership.md#weakly-up),
 will participate in Distributed Publish Subscribe, i.e. subscribers on nodes with
 `WeaklyUp` status will receive published messages if the publisher and subscriber are on
 same side of a network partition.
--- a/akka-docs/src/main/paradox/includes/cluster.md
+++ b/akka-docs/src/main/paradox/includes/cluster.md
@ -0,0 +1,50 @@
 <!--- #cluster-singleton --->
 ### Cluster Singleton
 For some use cases it is convenient or necessary to ensure only one 
 actor of a certain type is running somewhere in the cluster.
 This can be implemented by subscribing to member events, but there are several corner
 cases to consider. Therefore, this specific use case is covered by the Cluster Singleton.
 <!--- #cluster-singleton --->
 <!--- #cluster-sharding --->
 ### Cluster Sharding
 Distributes actors across several nodes in the cluster and supports interaction
 with the actors using their logical identifier, but without having to care about
 their physical location in the cluster.
 <!--- #cluster-sharding --->
 <!--- #cluster-ddata --->
 ### Distributed Data
 *Akka Distributed Data* is useful when you need to share data between nodes in an
 Akka Cluster. The data is accessed with an actor providing a key-value store like API.
 <!--- #cluster-ddata --->
 <!--- #cluster-pubsub --->
 ### Distributed Publish Subscribe
 Publish-subscribe messaging between actors in the cluster, and point-to-point messaging
 using the logical path of the actors, i.e. the sender does not have to know on which
 node the destination actor is running.
 <!--- #cluster-pubsub --->
 <!--- #cluster-multidc --->
 ### Cluster across multiple data centers
 Akka Cluster can be used across multiple data centers, availability zones or regions,
 so that one Cluster can span multiple data centers and still be tolerant to network partitions.
 <!--- #cluster-multidc --->
 <!--- #join-seeds-programmatic --->
 You may also join programmatically, which is attractive when dynamically discovering other nodes
 at startup by using some external tool or API. When joining to seed nodes you should not include
 the node itself except for the node that is supposed to be the first seed node, which should be
 placed first in the parameter to the programmatic join.
 <!--- #join-seeds-programmatic --->
--- a/akka-docs/src/main/paradox/index-cluster.md
+++ b/akka-docs/src/main/paradox/index-cluster.md
@ -7,7 +7,7 @@ For the new API see @ref[Cluster](typed/index-cluster.md).
@@@ index
-* [cluster-usage](cluster-usage.md)
+* [cluster-usage](cluster-usage.md)   
 * [cluster-routing](cluster-routing.md)
 * [cluster-singleton](cluster-singleton.md)
 * [distributed-pub-sub](distributed-pub-sub.md)
--- a/akka-docs/src/main/paradox/remoting.md
+++ b/akka-docs/src/main/paradox/remoting.md
@ -312,58 +312,26 @@ Watching a remote actor is not different than watching a local actor, as describ
 ### Failure Detector
-Under the hood remote death watch uses heartbeat messages and a failure detector to generate `Terminated`
+Please see:
 message from network failures and JVM crashes, in addition to graceful termination of watched
 actor.
-The heartbeat arrival times is interpreted by an implementation of
+* @ref:[Phi Accrual Failure Detector](typed/failure-detector.md) implementation for details
-[The Phi Accrual Failure Detector](http://www.jaist.ac.jp/~defago/files/pdf/IS_RR_2004_010.pdf).
+* [Using the Failure Detector](#using-the-failure-detector) below for usage 
-The suspicion level of failure is given by a value called *phi*.
+### Using the Failure Detector
-The basic idea of the phi failure detector is to express the value of *phi* on a scale that
+ 
-is dynamically adjusted to reflect current network conditions.
+Remoting uses the `akka.remote.PhiAccrualFailureDetector` failure detector by default, or you can provide your by
-
+implementing the `akka.remote.FailureDetector` and configuring it:
 The value of *phi* is calculated as:
 ```
-phi = -log10(1 - F(timeSinceLastHeartbeat))
+akka.remote.watch-failure-detector.implementation-class = "com.example.CustomFailureDetector"
-```
+``` 
-
+ 
-where F is the cumulative distribution function of a normal distribution with mean
+In the [Remote Configuration](#remote-configuration) you may want to adjust these
-and standard deviation estimated from historical heartbeat inter-arrival times.
+depending on you environment:
 In the [Remote Configuration](#remote-configuration) you can adjust the `akka.remote.watch-failure-detector.threshold`
 to define when a *phi* value is considered to be a failure.
 A low `threshold` is prone to generate many false positives but ensures
 a quick detection in the event of a real crash. Conversely, a high `threshold`
 generates fewer mistakes but needs more time to detect actual crashes. The
 default `threshold` is 10 and is appropriate for most situations. However in
 cloud environments, such as Amazon EC2, the value could be increased to 12 in
 order to account for network issues that sometimes occur on such platforms.
 The following chart illustrates how *phi* increase with increasing time since the
 previous heartbeat.
 ![phi1.png](./images/phi1.png)
 Phi is calculated from the mean and standard deviation of historical
 inter arrival times. The previous chart is an example for standard deviation
 of 200 ms. If the heartbeats arrive with less deviation the curve becomes steeper,
 i.e. it is possible to determine failure more quickly. The curve looks like this for
 a standard deviation of 100 ms.
 ![phi2.png](./images/phi2.png)
 To be able to survive sudden abnormalities, such as garbage collection pauses and
 transient network failures the failure detector is configured with a margin,
 `akka.remote.watch-failure-detector.acceptable-heartbeat-pause`. You may want to
 adjust the [Remote Configuration](#remote-configuration) of this depending on you environment.
 This is how the curve looks like for `acceptable-heartbeat-pause` configured to
 3 seconds.
 ![phi3.png](./images/phi3.png)
 * When a *phi* value is considered to be a failure `akka.remote.watch-failure-detector.threshold`
 * Margin of error for sudden abnormalities `akka.remote.watch-failure-detector.acceptable-heartbeat-pause`  
 ## Serialization
 You need to enable @ref:[serialization](serialization.md) for your actor messages.
--- a/akka-docs/src/main/paradox/stream/stream-refs.md
+++ b/akka-docs/src/main/paradox/stream/stream-refs.md
@ -41,7 +41,7 @@ implement manually.
@@@
@@@ warning { title=IMPORTANT }
-  Use stream refs with Akka Cluster. The failure detector can cause quarantining if plain Akka remoting is used.
+  Use stream refs with Akka Cluster. The [failure detector can cause quarantining](../typed/cluster-specification.md#quarantined) if plain Akka remoting is used.
@@@
 ## Stream References
--- a/akka-docs/src/main/paradox/typed/choosing-cluster.md
+++ b/akka-docs/src/main/paradox/typed/choosing-cluster.md
@ -0,0 +1,73 @@
 <a id="when-and-where-to-use-akka-cluster"></a>
 # Choosing Akka Cluster
 An architectural choice you have to make is if you are going to use a microservices architecture or
 a traditional distributed application. This choice will influence how you should use Akka Cluster.
 ## Microservices
 Microservices has many attractive properties, such as the independent nature of microservices allows for
 multiple smaller and more focused teams that can deliver new functionality more frequently and can
 respond quicker to business opportunities. Reactive Microservices should be isolated, autonomous, and have
 a single responsibility as identified by Jonas Bonér in the book
 [Reactive Microsystems: The Evolution of Microservices at Scale](https://info.lightbend.com/ebook-reactive-microservices-the-evolution-of-microservices-at-scale-register.html).
 In a microservices architecture, you should consider communication within a service and between services.
 In general we recommend against using Akka Cluster and actor messaging between _different_ services because that
 would result in a too tight code coupling between the services and difficulties deploying these independent of
 each other, which is one of the main reasons for using a microservices architecture.
 See the discussion on
@scala[[Internal and External Communication](https://www.lagomframework.com/documentation/current/scala/InternalAndExternalCommunication.html)]
@java[[Internal and External Communication](https://www.lagomframework.com/documentation/current/java/InternalAndExternalCommunication.html)]
 in the docs of the [Lagom Framework](https://www.lagomframework.com) (where each microservice is an Akka Cluster)
 for some background on this.
 Nodes of a single service (collectively called a cluster) require less decoupling. They share the same code and
 are deployed together, as a set, by a single team or individual. There might be two versions running concurrently
 during a rolling deployment, but deployment of the entire set has a single point of control. For this reason,
 intra-service communication can take advantage of Akka Cluster, failure management and actor messaging, which
 is convenient to use and has great performance.
 Between different services [Akka HTTP](https://doc.akka.io/docs/akka-http/current) or
 [Akka gRPC](https://doc.akka.io/docs/akka-grpc/current/) can be used for synchronous (yet non-blocking)
 communication and [Akka Streams Kafka](https://doc.akka.io/docs/akka-stream-kafka/current/home.html) or other
 [Alpakka](https://doc.akka.io/docs/alpakka/current/) connectors for integration asynchronous communication.
 All those communication mechanisms work well with streaming of messages with end-to-end back-pressure, and the
 synchronous communication tools can also be used for single request response interactions. It is also important
 to note that when using these tools both sides of the communication do not have to be implemented with Akka,
 nor does the programming language matter.
 ## Traditional distributed application
 We acknowledge that microservices also introduce many new challenges and it's not the only way to
 build applications. A traditional distributed application may have less complexity and work well in many cases.
 For example for a small startup, with a single team, building an application where time to market is everything.
 Akka Cluster can efficiently be used for building such distributed application.
 In this case, you have a single deployment unit, built from a single code base (or using traditional binary
 dependency management to modularize) but deployed across many nodes using a single cluster.
 Tighter coupling is OK, because there is a central point of deployment and control. In some cases, nodes may
 have specialized runtime roles which means that the cluster is not totally homogenous (e.g., "front-end" and
 "back-end" nodes, or dedicated master/worker nodes) but if these are run from the same built artifacts this
 is just a runtime behavior and doesn't cause the same kind of problems you might get from tight coupling of
 totally separate artifacts.
 A tightly coupled distributed application has served the industry and many Akka users well for years and is
 still a valid choice.
 ## Distributed monolith
 There is also an anti-pattern that is sometimes called "distributed monolith". You have multiple services
 that are built and deployed independently from each other, but they have a tight coupling that makes this
 very risky, such as a shared cluster, shared code and dependencies for service API calls, or a shared
 database schema. There is a false sense of autonomy because of the physical separation of the code and
 deployment units, but you are likely to encounter problems because of changes in the implementation of
 one service leaking into the behavior of others. See Ben Christensen's
 [Don’t Build a Distributed Monolith](https://www.microservices.com/talks/dont-build-a-distributed-monolith/).
 Organizations that find themselves in this situation often react by trying to centrally coordinate deployment
 of multiple services, at which point you have lost the principal benefit of microservices while taking on
 the costs. You are in a halfway state with things that aren't really separable being built and deployed
 in a separate way. Some people do this, and some manage to make it work, but it's not something we would
 recommend and it needs to be carefully managed.
--- a/akka-docs/src/main/paradox/typed/cluster-membership.md
+++ b/akka-docs/src/main/paradox/typed/cluster-membership.md
@ -0,0 +1,158 @@
 # Cluster Membership Service
 The core of Akka Cluster is the cluster membership, to keep track of what nodes are part of the cluster and
 their health. Cluster membership is communicated using @ref:[gossip](cluster-specification.md#gossip) and
@ref:[failure detection](cluster-specification.md#failure-detector).
 There are several @ref:[Higher level Cluster tools](../typed/cluster.md#higher-level-cluster-tools) that are built
 on top of the cluster membership service.
 ## Introduction
 A cluster is made up of a set of member nodes. The identifier for each node is a
 `hostname:port:uid` tuple. An Akka application can be distributed over a cluster with
 each node hosting some part of the application. Cluster membership and the actors running
 on that node of the application are decoupled. A node could be a member of a
 cluster without hosting any actors. Joining a cluster is initiated
 by issuing a `Join` command to one of the nodes in the cluster to join.
 The node identifier internally also contains a UID that uniquely identifies this
 actor system instance at that `hostname:port`. Akka uses the UID to be able to
 reliably trigger remote death watch. This means that the same actor system can never
 join a cluster again once it's been removed from that cluster. To re-join an actor
 system with the same `hostname:port` to a cluster you have to stop the actor system
 and start a new one with the same `hostname:port` which will then receive a different
 UID.
 ## Member States
 The cluster membership state is a specialized [CRDT](http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf), which means that it has a monotonic
 merge function. When concurrent changes occur on different nodes the updates can always be
 merged and converge to the same end result.
 * **joining** - transient state when joining a cluster
 * **weakly up** - transient state while network split (only if `akka.cluster.allow-weakly-up-members=on`)
 * **up** - normal operating state
 * **leaving** / **exiting** - states during graceful removal
 * **down** - marked as down (no longer part of cluster decisions)
 * **removed** - tombstone state (no longer a member)
 ## Member Events
 The events to track the life-cycle of members are:
 * `ClusterEvent.MemberJoined` - A new member has joined the cluster and its status has been changed to `Joining`
 * `ClusterEvent.MemberUp` - A new member has joined the cluster and its status has been changed to `Up`
 * `ClusterEvent.MemberExited` - A member is leaving the cluster and its status has been changed to `Exiting`
 Note that the node might already have been shutdown when this event is published on another node.
 * `ClusterEvent.MemberRemoved` - Member completely removed from the cluster.
 * `ClusterEvent.UnreachableMember` - A member is considered as unreachable, detected by the failure detector
 of at least one other node.
 * `ClusterEvent.ReachableMember` - A member is considered as reachable again, after having been unreachable.
 All nodes that previously detected it as unreachable has detected it as reachable again.
 ## Membership Lifecycle
 A node begins in the `joining` state. Once all nodes have seen that the new
 node is joining (through gossip convergence) the `leader` will set the member
 state to `up`.
 If a node is leaving the cluster in a safe, expected manner then it switches to
 the `leaving` state. Once the leader sees the convergence on the node in the
 `leaving` state, the leader will then move it to `exiting`.  Once all nodes
 have seen the exiting state (convergence) the `leader` will remove the node
 from the cluster, marking it as `removed`.
 If a node is `unreachable` then gossip convergence is not possible and therefore
 any `leader` actions are also not possible (for instance, allowing a node to
 become a part of the cluster). To be able to move forward the state of the
 `unreachable` nodes must be changed. It must become `reachable` again or marked
 as `down`. If the node is to join the cluster again the actor system must be
 restarted and go through the joining process again. The cluster can, through the
 leader, also *auto-down* a node after a configured time of unreachability. If new
 incarnation of unreachable node tries to rejoin the cluster old incarnation will be 
 marked as `down` and new incarnation can rejoin the cluster without manual intervention. 
@@@ note
 If you have *auto-down* enabled and the failure detector triggers, you
 can over time end up with a lot of single node clusters if you don't put
 measures in place to shut down nodes that have become `unreachable`. This
 follows from the fact that the `unreachable` node will likely see the rest of
 the cluster as `unreachable`, become its own leader and form its own cluster.
@@@
 <a id="weakly-up"></a>
 ## WeaklyUp Members
 If a node is `unreachable` then gossip convergence is not possible and therefore any
 `leader` actions are also not possible. However, we still might want new nodes to join
 the cluster in this scenario.
 `Joining` members will be promoted to `WeaklyUp` and become part of the cluster if
 convergence can't be reached. Once gossip convergence is reached, the leader will move `WeaklyUp`
 members to `Up`.
 This feature is enabled by default, but it can be disabled with configuration option:
 ```
 akka.cluster.allow-weakly-up-members = off
 ```
 You can subscribe to the `WeaklyUp` membership event to make use of the members that are
 in this state, but you should be aware of that members on the other side of a network partition
 have no knowledge about the existence of the new members. You should for example not count
 `WeaklyUp` members in quorum decisions.
 As mentioned before, if a node is `unreachable` then gossip convergence is not
 possible and therefore any `leader` actions are also not possible. By enabling
 `akka.cluster.allow-weakly-up-members` (enabled by default) it is possible to 
 let new joining nodes be promoted while convergence is not yet reached. These 
 `Joining` nodes will be promoted as `WeaklyUp`. Once gossip convergence is 
 reached, the leader will move `WeaklyUp` members to `Up`.
 Note that members on the other side of a network partition have no knowledge about 
 the existence of the new members. You should for example not count `WeaklyUp` 
 members in quorum decisions.
 ## State Diagrams
 ### State Diagram for the Member States (`akka.cluster.allow-weakly-up-members=off`)
 ![member-states.png](../images/member-states.png)
 ### State Diagram for the Member States (`akka.cluster.allow-weakly-up-members=on`)
 ![member-states-weakly-up.png](../images/member-states-weakly-up.png)
 #### User Actions
 * **join** - join a single node to a cluster - can be explicit or automatic on
 startup if a node to join have been specified in the configuration
 * **leave** - tell a node to leave the cluster gracefully
 * **down** - mark a node as down
 #### Leader Actions
 The `leader` has the following duties:
 * shifting members in and out of the cluster
    * joining -> up
    * weakly up -> up *(no convergence is required for this leader action to be performed)*
    * exiting -> removed
 #### Failure Detection and Unreachability
 * **fd*** - the failure detector of one of the monitoring nodes has triggered
 causing the monitored node to be marked as unreachable
 * **unreachable*** - unreachable is not a real member states but more of a flag in addition to the state signaling that the cluster is unable to talk to this node, after being unreachable the failure detector may detect it as reachable again and thereby remove the flag
--- a/akka-docs/src/main/paradox/typed/cluster-specification.md
+++ b/akka-docs/src/main/paradox/typed/cluster-specification.md
@ -0,0 +1,184 @@
 # Cluster Specification
 This document describes the design concepts of Akka Cluster. For the guide on using Akka Cluster please see either
 * @ref:[Cluster Usage](../typed/cluster.md)
 * @ref:[Cluster Usage with classic Akka APIs](../cluster-usage.md)
 * @ref:[Cluster Membership Service](cluster-membership.md)
 ## Introduction
 Akka Cluster provides a fault-tolerant decentralized peer-to-peer based
@ref:[Cluster Membership Service](cluster-membership.md#cluster-membership-service) with no single point of failure or 
 single point of bottleneck. It does this using [gossip](#gossip) protocols and an automatic [failure detector](#failure-detector).
 Akka Cluster allows for building distributed applications, where one application or service spans multiple nodes
 (in practice multiple `ActorSystem`s). 
 ## Terms
 **node**
 : A logical member of a cluster. There could be multiple nodes on a physical
 machine. Defined by a *hostname:port:uid* tuple.
 **cluster**
 : A set of nodes joined together through the @ref:[Cluster Membership Service](cluster-membership.md#cluster-membership-service).
 **leader**
 : A single node in the cluster that acts as the leader. Managing cluster convergence
 and membership state transitions.
 ### Gossip
 The cluster membership used in Akka is based on Amazon's [Dynamo](http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) system and
 particularly the approach taken in Basho's' [Riak](http://basho.com/technology/architecture/) distributed database.
 Cluster membership is communicated using a [Gossip Protocol](http://en.wikipedia.org/wiki/Gossip_protocol), where the current
 state of the cluster is gossiped randomly through the cluster, with preference to
 members that have not seen the latest version.
 #### Vector Clocks
 [Vector clocks](http://en.wikipedia.org/wiki/Vector_clock) are a type of data structure and algorithm for generating a partial
 ordering of events in a distributed system and detecting causality violations.
 We use vector clocks to reconcile and merge differences in cluster state
 during gossiping. A vector clock is a set of (node, counter) pairs. Each update
 to the cluster state has an accompanying update to the vector clock.
 #### Gossip Convergence
 Information about the cluster converges locally at a node at certain points in time.
 This is when a node can prove that the cluster state he is observing has been observed
 by all other nodes in the cluster. Convergence is implemented by passing a set of nodes
 that have seen current state version during gossip. This information is referred to as the
 seen set in the gossip overview. When all nodes are included in the seen set there is
 convergence.
 Gossip convergence cannot occur while any nodes are `unreachable`. The nodes need
 to become `reachable` again, or moved to the `down` and `removed` states
 (see the @ref:[Cluster Membership Lifecycle](cluster-membership.md#membership-lifecycle) section). This only blocks the leader
 from performing its cluster membership management and does not influence the application
 running on top of the cluster. For example this means that during a network partition
 it is not possible to add more nodes to the cluster. The nodes can join, but they
 will not be moved to the `up` state until the partition has healed or the unreachable
 nodes have been downed.
 #### Failure Detector
 The failure detector in Akka Cluster is responsible for trying to detect if a node is
 `unreachable` from the rest of the cluster. For this we are using the
@ref:[Phi Accrual Failure Detector](failure-detector.md) implementation.
 To be able to survive sudden abnormalities, such as garbage collection pauses and
 transient network failures the failure detector is easily @ref:[configurable](cluster.md#using-the-failure-detector)
 for tuning to your environments and needs.
 In a cluster each node is monitored by a few (default maximum 5) other nodes.
 The nodes to monitor are selected from neighbors in a hashed ordered node ring.
 This is to increase the likelihood to monitor across racks and data centers, but the order
 is the same on all nodes, which ensures full coverage.
 When any node is detected to be `unreachable` this data is spread to
 the rest of the cluster through the @ref:[gossip](#gossip). In other words, only one node needs to
 mark a node `unreachable` to have the rest of the cluster mark that node `unreachable`.
 The failure detector will also detect if the node becomes `reachable` again. When
 all nodes that monitored the `unreachable` node detect it as `reachable` again
 the cluster, after gossip dissemination, will consider it as `reachable`.
 <a id="quarantined"></a>
 If system messages cannot be delivered to a node it will be quarantined and then it
 cannot come back from `unreachable`. This can happen if the there are too many
 unacknowledged system messages (e.g. watch, Terminated, remote actor deployment,
 failures of actors supervised by remote parent). Then the node needs to be moved
 to the `down` or `removed` states (see @ref:[Cluster Membership Lifecycle](cluster-membership.md#membership-lifecycle))
 and the actor system of the quarantined node must be restarted before it can join the cluster again.
 See the following for more details:
 * @ref:[Phi Accrual Failure Detector](failure-detector.md) implementation
 * @ref:[Using the Failure Detector](cluster.md#using-the-failure-detector)
 #### Leader
 After gossip convergence a `leader` for the cluster can be determined. There is no
 `leader` election process, the `leader` can always be recognised deterministically
 by any node whenever there is gossip convergence. The leader is only a role, any node
 can be the leader and it can change between convergence rounds.
 The `leader` is the first node in sorted order that is able to take the leadership role,
 where the preferred member states for a `leader` are `up` and `leaving`
 (see the @ref:[Cluster Membership Lifecycle](cluster-membership.md#membership-lifecycle) for more  information about member states).
 The role of the `leader` is to shift members in and out of the cluster, changing
 `joining` members to the `up` state or `exiting` members to the `removed`
 state. Currently `leader` actions are only triggered by receiving a new cluster
 state with gossip convergence.
 The `leader` also has the power, if configured so, to "auto-down" a node that
 according to the @ref:[Failure Detector](#failure-detector) is considered `unreachable`. This means setting
 the `unreachable` node status to `down` automatically after a configured time
 of unreachability.
 #### Seed Nodes
 The seed nodes are contact points for new nodes joining the cluster.
 When a new node is started it sends a message to all seed nodes and then sends
 a join command to the seed node that answers first.
 The seed nodes configuration value does not have any influence on the running
 cluster itself, it is only relevant for new nodes joining the cluster as it
 helps them to find contact points to send the join command to; a new member
 can send this command to any current member of the cluster, not only to the seed nodes.
 #### Gossip Protocol
 A variation of *push-pull gossip* is used to reduce the amount of gossip
 information sent around the cluster. In push-pull gossip a digest is sent
 representing current versions but not actual values; the recipient of the gossip
 can then send back any values for which it has newer versions and also request
 values for which it has outdated versions. Akka uses a single shared state with
 a vector clock for versioning, so the variant of push-pull gossip used in Akka
 makes use of this version to only push the actual state as needed.
 Periodically, the default is every 1 second, each node chooses another random
 node to initiate a round of gossip with. If less than ½ of the nodes resides in the
 seen set (have seen the new state) then the cluster gossips 3 times instead of once
 every second. This adjusted gossip interval is a way to speed up the convergence process
 in the early dissemination phase after a state change.
 The choice of node to gossip with is random but biased towards nodes that might not have seen
 the current state version. During each round of gossip exchange, when convergence is not yet reached, a node
 uses a very high probability (which is configurable) to gossip with another node which is not part of the seen set, i.e. 
 which is likely to have an older version of the state. Otherwise it gossips with any random live node.
 This biased selection is a way to speed up the convergence process in the late dissemination
 phase after a state change.
 For clusters larger than 400 nodes (configurable, and suggested by empirical evidence)
 the 0.8 probability is gradually reduced to avoid overwhelming single stragglers with
 too many concurrent gossip requests. The gossip receiver also has a mechanism to
 protect itself from too many simultaneous gossip messages by dropping messages that
 have been enqueued in the mailbox for too long of a time.
 While the cluster is in a converged state the gossiper only sends a small gossip status message containing the gossip
 version to the chosen node. As soon as there is a change to the cluster (meaning non-convergence)
 then it goes back to biased gossip again.
 The recipient of the gossip state or the gossip status can use the gossip version
 (vector clock) to determine whether:
 1. it has a newer version of the gossip state, in which case it sends that back
 to the gossiper
 2. it has an outdated version of the state, in which case the recipient requests
 the current state from the gossiper by sending back its version of the gossip state
 3. it has conflicting gossip versions, in which case the different versions are merged
 and sent back
 If the recipient and the gossip have the same version then the gossip state is
 not sent or requested.
 The periodic nature of the gossip has a nice batching effect of state changes,
 e.g. joining several nodes quickly after each other to one node will result in only
 one state change to be spread to other members in the cluster.
 The gossip messages are serialized with [protobuf](https://code.google.com/p/protobuf/) and also gzipped to reduce payload
 size.
--- a/akka-docs/src/main/paradox/typed/cluster.md
+++ b/akka-docs/src/main/paradox/typed/cluster.md
@ -1,8 +1,18 @@
-# Cluster Membership
+# Cluster Usage
 This document describes how to use Akka Cluster and the Cluster APIs. 
 For specific documentation topics see: 
 * @ref:[Cluster Specification](cluster-specification.md)
 * @ref:[Cluster Membership Service](cluster-membership.md)
 * @ref:[When and where to use Akka Cluster](choosing-cluster.md)
 * @ref:[Higher level Cluster tools](#higher-level-cluster-tools)
 * @ref:[Rolling Updates](../additional/rolling-updates.md)
 * @ref:[Operating, Managing, Observability](../additional/operations.md)
 ## Dependency
-To use Akka Cluster Typed, you must add the following dependency in your project:
+To use Akka Cluster add the following dependency in your project:
@@dependency[sbt,Maven,Gradle] {
  group=com.typesafe.akka
@ -10,16 +20,17 @@ To use Akka Cluster Typed, you must add the following dependency in your project
  version=$akka.version$
 }
-## Introduction
+## Cluster API Extension
-For an introduction to Akka Cluster concepts see @ref:[Cluster Specification](../common/cluster.md). This documentation shows how to use the typed
+The Cluster extension gives you access to management tasks such as @ref:[Joining, Leaving and Downing](cluster-membership.md#user-actions)
-Cluster API.
+and subscription of cluster membership events such as @ref:[MemberUp, MemberRemoved and UnreachableMember](cluster-membership.md#membership-lifecycle),
 which are exposed as event APIs.  
-You need to enable @ref:[serialization](../serialization.md) for your actor messages.
+It does this through these references are on the `Cluster` extension:
@ref:[Serialization with Jackson](../serialization-jackson.md) is a good choice in many cases and our
 recommendation if you don't have other preference.
-## Examples
+* manager: An @scala[`ActorRef[akka.cluster.typed.ClusterCommand]`]@java[`ActorRef<akka.cluster.typed.ClusterCommand>`] where a `ClusterCommand` is a command such as: `Join`, `Leave` and `Down`
 * subscriptions: An @scala[`ActorRef[akka.cluster.typed.ClusterStateSubscription]`]@java[`ActorRef<akka.cluster.typed.ClusterStateSubscription>`] where a `ClusterStateSubscription` is one of `GetCurrentState` or `Subscribe` and `Unsubscribe` to cluster events like `MemberRemoved`
 * state: The current `CurrentClusterState`
 All of the examples below assume the following imports:
@ -29,18 +40,12 @@ Scala
 Java
 :  @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-imports }
 <a id="basic-cluster-configuration"></a>
 And the minimum configuration required is to set a host/port for remoting and the `akka.actor.provider = "cluster"`.
@@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #config-seeds }
-
+Starting the cluster on each node:
 ## Cluster API extension
 The typed Cluster extension gives access to management tasks (Joining, Leaving, Downing, …) and subscription of
 cluster membership events (MemberUp, MemberRemoved, UnreachableMember, etc). Those are exposed as two different actor
 references, i.e. it’s a message based API.
 The references are on the `Cluster` extension:
 Scala
 :  @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #cluster-create }
@ -48,16 +53,15 @@ Scala
 Java
 :  @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-create }
-The Cluster extensions gives you access to:
+@@@ note
  The name of the cluster's `ActorSystem` must be the same for all members, which is passed in when you start the `ActorSystem`.
-* manager: An @scala[`ActorRef[ClusterCommand]`]@java[`ActorRef<ClusterCommand>`] where a `ClusterCommand` is a command such as: `Join`, `Leave` and `Down`
+@@@
 * subscriptions: An `ActorRef[ClusterStateSubscription]` where a `ClusterStateSubscription` is one of `GetCurrentState` or `Subscribe` and `Unsubscribe` to cluster events like `MemberRemoved`
 * state: The current `CurrentClusterState`
 ### Joining and Leaving a Cluster 
-### Cluster Management
+If not using configuration to specify [seed nodes to join](#joining-to-seed-nodes), joining the cluster can be done programmatically via the `manager`.
 If not using configuration to specify seeds joining the cluster can be done programmatically via the `manager`.
 Scala
 :  @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #cluster-join }
@ -65,7 +69,7 @@ Scala
 Java
 :  @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-join }
-Leaving and downing are similar e.g.
+[Leaving](#leaving) the cluster and [downing](#downing) a node are similar:
 Scala
 :  @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #cluster-leave }
@ -73,13 +77,13 @@ Scala
 Java
 :  @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-leave }
-### Cluster subscriptions
+### Cluster Subscriptions
 Cluster `subscriptions` can be used to receive messages when cluster state changes. For example, registering
 for all `MemberEvent`s, then using the `manager` to have a node leave the cluster will result in events
-for the node going through the lifecycle described in @ref:[Cluster Specification](../common/cluster.md).
+for the node going through the @ref:[Membership Lifecycle](cluster-membership.md#membership-lifecycle).
-This example subscribes with a @scala[`subscriber: ActorRef[MemberEvent]`]@java[`ActorRef<MemberEvent> subscriber`]:
+This example subscribes to a @scala[`subscriber: ActorRef[MemberEvent]`]@java[`ActorRef<MemberEvent> subscriber`]:
 Scala
 :  @@snip [BasicClusterExampleSpec.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/BasicClusterExampleSpec.scala) { #cluster-subscribe }
@ -95,15 +99,328 @@ Scala
 Java
 :  @@snip [BasicClusterExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/BasicClusterExampleTest.java) { #cluster-leave-example }
 ## Serialization
-See [serialization](https://doc.akka.io/docs/akka/current/scala/serialization.html) for how messages are sent between
+### Cluster State
-ActorSystems. Actor references are typically included in the messages,
+
-since there is no `sender`. To serialize actor references to/from string representation you will use the `ActorRefResolver`.
+Instead of subscribing to cluster events it can sometimes be convenient to only get the full membership state with
-For example here's how a serializer could look for the `Ping` and `Pong` messages above:
+@scala[`Cluster(system).state`]@java[`Cluster.get(system).state()`]. Note that this state is not necessarily in sync with the events published to a
 cluster subscription.
 See [Cluster Membership](cluster-membership.md#member-events) more information on member events specifically.
 There are more types of change events, consult the API documentation
 of classes that extends `akka.cluster.ClusterEvent.ClusterDomainEvent` for details about the events.
 ## Cluster Membership API
 The `akka.cluster.seed-nodes` are initial contact points for [automatically](#joining-automatically-to-seed-nodes-with-cluster-bootstrap)
 or [manually](#joining-configured-seed-nodes) joining a cluster.
 After the joining process the seed nodes are not special and they participate in the cluster in exactly the same
 way as other nodes.
 ### Joining configured seed nodes
 When a new node is started it sends a message to all seed nodes and then sends a join command to the one that
 answers first. If no one of the seed nodes replied (might not be started yet)
 it retries this procedure until successful or shutdown.
 You define the seed nodes in the [configuration](#configuration) file (application.conf):
 This can also be defined as Java system properties when starting the JVM using the following syntax:
 ```
 -Dakka.cluster.seed-nodes.0=akka://ClusterSystem@host1:2552
 -Dakka.cluster.seed-nodes.1=akka://ClusterSystem@host2:2552
 ```
 The seed nodes can be started in any order and it is not necessary to have all
 seed nodes running, but the node configured as the **first element** in the `seed-nodes`
 configuration list must be started when initially starting a cluster, otherwise the
 other seed-nodes will not become initialized and no other node can join the cluster.
 The reason for the special first seed node is to avoid forming separated islands when
 starting from an empty cluster.
 It is quickest to start all configured seed nodes at the same time (order doesn't matter),
 otherwise it can take up to the configured `seed-node-timeout` until the nodes
 can join.
 Once more than two seed nodes have been started it is no problem to shut down the first
 seed node. If the first seed node is restarted, it will first try to join the other
 seed nodes in the existing cluster. Note that if you stop all seed nodes at the same time
 and restart them with the same `seed-nodes` configuration they will join themselves and
 form a new cluster instead of joining remaining nodes of the existing cluster. That is
 likely not desired and should be avoided by listing several nodes as seed nodes for redundancy
 and don't stop all of them at the same time.
 Note that if you are going to start the nodes on different machines you need to specify the
 ip-addresses or host names of the machines in `application.conf` instead of `127.0.0.1`
 ### Joining automatically to seed nodes with Cluster Bootstrap
 Automatic discovery of nodes for the joining process is available
 using the open source Akka Management project's module, 
@ref:[Cluster Bootstrap](../additional/operations.md#cluster-bootstrap).
 Please refer to its documentation for more details.
 ### Joining programmatically to seed nodes
@@include[cluster.md](../includes/cluster.md) { #join-seeds-programmatic }
 ### Tuning joins
 Unsuccessful attempts to contact seed nodes are automatically retried after the time period defined in
 configuration property `seed-node-timeout`. Unsuccessful attempt to join a specific seed node is
 automatically retried after the configured `retry-unsuccessful-join-after`. Retrying means that it
 tries to contact all seed nodes and then joins the node that answers first. The first node in the list
 of seed nodes will join itself if it cannot contact any of the other seed nodes within the
 configured `seed-node-timeout`.
 The joining of given seed nodes will by default be retried indefinitely until
 a successful join. That process can be aborted if unsuccessful by configuring a
 timeout. When aborted it will run @ref:[Coordinated Shutdown](../actors.md#coordinated-shutdown),
 which by default will terminate the ActorSystem. CoordinatedShutdown can also be configured to exit
 the JVM. It is useful to define this timeout if the `seed-nodes` are assembled
 dynamically and a restart with new seed-nodes should be tried after unsuccessful
 attempts.
 ```
 akka.cluster.shutdown-after-unsuccessful-join-seed-nodes = 20s
 akka.coordinated-shutdown.terminate-actor-system = on
 ```
 If you don't configure seed nodes or use one of the join seed node functions you need to join the cluster manually,
 which can be performed by using [JMX](#jmx) or [HTTP](#http).
 You can join to any node in the cluster. It does not have to be configured as a seed node.
 Note that you can only join to an existing cluster member, which means that for bootstrapping some
 node must join itself,and then the following nodes could join them to make up a cluster.
 An actor system can only join a cluster once. Additional attempts will be ignored.
 When it has successfully joined it must be restarted to be able to join another
 cluster or to join the same cluster again. It can use the same host name and port
 after the restart, when it come up as new incarnation of existing member in the cluster,
 trying to join in, then the existing one will be removed from the cluster and then it will
 be allowed to join.
 <a id="automatic-vs-manual-downing"></a>
 ### Downing
 When a member is considered by the failure detector to be `unreachable` the
 leader is not allowed to perform its duties, such as changing status of
 new joining members to 'Up'. The node must first become `reachable` again, or the
 status of the unreachable member must be changed to 'Down'. Changing status to 'Down'
 can be performed automatically or manually. By default it must be done manually, using
@ref:[JMX](../additional/operations.md#jmx) or @ref:[HTTP](../additional/operations.md#http).
 It can also be performed programmatically with @scala[`Cluster(system).down(address)`]@java[`Cluster.get(system).down(address)`].
 If a node is still running and sees its self as Down it will shutdown. @ref:[Coordinated Shutdown](../actors.md#coordinated-shutdown) will automatically
 run if `run-coordinated-shutdown-when-down` is set to `on` (the default) however the node will not try
 and leave the cluster gracefully so sharding and singleton migration will not occur.
 A production solution for the downing problem is provided by
 [Split Brain Resolver](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html),
 which is part of the [Lightbend Reactive Platform](http://www.lightbend.com/platform).
 If you don’t use RP, you should anyway carefully read the [documentation](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html)
 of the Split Brain Resolver and make sure that the solution you are using handles the concerns
 described there.
 ### Auto-downing (DO NOT USE)
 There is an automatic downing feature that you should not use in production. For testing you can enable it with configuration:
 ```
 akka.cluster.auto-down-unreachable-after = 120s
 ```
 This means that the cluster leader member will change the `unreachable` node
 status to `down` automatically after the configured time of unreachability.
 This is a naïve approach to remove unreachable nodes from the cluster membership.
 It can be useful during development but in a production environment it will eventually breakdown the cluster. 
 When a network partition occurs, both sides of the partition will see the other side as unreachable and remove it from the cluster.
 This results in the formation of two separate, disconnected, clusters (known as *Split Brain*).
 This behaviour is not limited to network partitions. It can also occur if a node
 in the cluster is overloaded, or experiences a long GC pause.
@@@ warning
 We recommend against using the auto-down feature of Akka Cluster in production. It
 has multiple undesirable consequences for production systems.
 If you are using @ref:[Cluster Singleton](cluster-singleton.md) or @ref:[Cluster Sharding](cluster-sharding.md) it can break the contract provided by 
 those features. Both provide a guarantee that an actor will be unique in a cluster.
 With the auto-down feature enabled, it is possible for multiple independent clusters
 to form (*Split Brain*). When this happens the guaranteed uniqueness will no
 longer be true resulting in undesirable behaviour in the system.
 This is even more severe when @ref:[Akka Persistence](persistence.md) is used in
 conjunction with Cluster Sharding. In this case, the lack of unique actors can 
 cause multiple actors to write to the same journal. Akka Persistence operates on a
 single writer principle. Having multiple writers will corrupt the journal
 and make it unusable. 
 Finally, even if you don't use features such as Persistence, Sharding, or Singletons, 
 auto-downing can lead the system to form multiple small clusters. These small
 clusters will be independent from each other. They will be unable to communicate
 and as a result you may experience performance degradation. Once this condition
 occurs, it will require manual intervention in order to reform the cluster.
 Because of these issues, auto-downing should **never** be used in a production environment.
@@@
 ### Leaving
 There are two ways to remove a member from the cluster.
 1. The recommended way to leave a cluster is a graceful exit, informing the cluster that a node shall leave. 
 This can be performed using @ref:[JMX](../additional/operations.md#jmx) or @ref:[HTTP](../additional/operations.md#http). 
 This method will offer faster hand off to peer nodes during node shutdown.
 1. When a graceful exit is not possible, you can stop the actor system (or the JVM process, for example a SIGTERM sent from the environment). It will be detected
 as unreachable and removed after the automatic or manual downing.
 The @ref:[Coordinated Shutdown](../actors.md#coordinated-shutdown) will automatically run when the cluster node sees itself as
 `Exiting`, i.e. leaving from another node will trigger the shutdown process on the leaving node.
 Tasks for graceful leaving of cluster including graceful shutdown of Cluster Singletons and
 Cluster Sharding are added automatically when Akka Cluster is used, i.e. running the shutdown
 process will also trigger the graceful leaving if it's not already in progress.
 Normally this is handled automatically, but in case of network failures during this process it might still
 be necessary to set the node’s status to `Down` in order to complete the removal. For handling network failures
 see [Split Brain Resolver](http://developer.lightbend.com/docs/akka-commercial-addons/current/split-brain-resolver.html),
 part of the [Lightbend Reactive Platform](http://www.lightbend.com/platform).
 ## Serialization 
 Enable @ref:[serialization](../serialization.md) to send events between ActorSystems.
@ref:[Serialization with Jackson](../serialization-jackson.md) is a good choice in many cases, and our
 recommendation if you don't have other preferences or constraints.
 Actor references are typically included in the messages, since there is no `sender`. 
 To serialize actor references to/from string representation you would use the `ActorRefResolver`.
 For example here's how a serializer could look for `Ping` and `Pong` messages:
 Scala
 :  @@snip [PingSerializer.scala](/akka-cluster-typed/src/test/scala/docs/akka/cluster/typed/PingSerializer.scala) { #serializer }
 Java
 :  @@snip [PingSerializerExampleTest.java](/akka-cluster-typed/src/test/java/jdocs/akka/cluster/typed/PingSerializerExampleTest.java) { #serializer }
 You can look at the
@java[@extref[Cluster example project](samples:akka-samples-cluster-java)]
@scala[@extref[Cluster example project](samples:akka-samples-cluster-scala)]
 to see what this looks like in practice.
 ## Node Roles
 Not all nodes of a cluster need to perform the same function: there might be one sub-set which runs the web front-end,
 one which runs the data access layer and one for the number-crunching. Deployment of actors, for example by cluster-aware
 routers, can take node roles into account to achieve this distribution of responsibilities.
 The node roles are defined in the configuration property named `akka.cluster.roles`
 and typically defined in the start script as a system property or environment variable.
 The roles are part of the membership information in `MemberEvent` that you can subscribe to.
 ## Failure Detector
 The nodes in the cluster monitor each other by sending heartbeats to detect if a node is
 unreachable from the rest of the cluster. Please see:
 * @ref:[Failure Detector specification](cluster-specification.md#failure-detector)
 * @ref:[Phi Accrual Failure Detector](failure-detector.md) implementation
 * [Using the Failure Detector](#using-the-failure-detector) 
 ### Using the Failure Detector
 Cluster uses the `akka.remote.PhiAccrualFailureDetector` failure detector by default, or you can provide your by
 implementing the `akka.remote.FailureDetector` and configuring it:
 ```
 akka.cluster.implementation-class = "com.example.CustomFailureDetector"
 ```
 In the [Cluster Configuration](#configuration) you may want to adjust these
 depending on you environment:
 * When a *phi* value is considered to be a failure `akka.cluster.failure-detector.threshold`
 * Margin of error for sudden abnormalities `akka.cluster.failure-detector.acceptable-heartbeat-pause`  
 ## How to test
 Akka comes with and uses several types of testing strategies:
 * @ref:[Testing](testing.md)
 * @ref:[Multi Node Testing](../multi-node-testing.md)
 * @ref:[Multi JVM Testing](../multi-jvm-testing.md)
 ## Configuration
 There are several configuration properties for the cluster. Refer to the 
@ref:[reference configuration](../general/configuration.md#config-akka-cluster) for full
 configuration descriptions, default values and options.
 ### Cluster Info Logging
 You can silence the logging of cluster events at info level with configuration property:
 ```
 akka.cluster.log-info = off
 ```
 You can enable verbose logging of cluster events at info level, e.g. for temporary troubleshooting, with configuration property:
 ```
 akka.cluster.log-info-verbose = on
 ```
 ### Cluster Dispatcher
 The cluster extension is implemented with actors. To protect them against
 disturbance from user actors they are by default run on the internal dispatcher configured
 under `akka.actor.internal-dispatcher`. The cluster actors can potentially be isolated even
 further, onto their own dispatcher using the setting `akka.cluster.use-dispatcher`
 or made run on the same dispatcher to keep the number of threads down.
 ### Configuration Compatibility Check
 Creating a cluster is about deploying two or more nodes and make then behave as if they were one single application. Therefore it's extremely important that all nodes in a cluster are configured with compatible settings. 
 The Configuration Compatibility Check feature ensures that all nodes in a cluster have a compatible configuration. Whenever a new node is joining an existing cluster, a subset of its configuration settings (only those that are required to be checked) is sent to the nodes in the cluster for verification. Once the configuration is checked on the cluster side, the cluster sends back its own set of required configuration settings. The joining node will then verify if it's compliant with the cluster configuration. The joining node will only proceed if all checks pass, on both sides.   
 New custom checkers can be added by extending `akka.cluster.JoinConfigCompatChecker` and including them in the configuration. Each checker must be associated with a unique key:
 ```
 akka.cluster.configuration-compatibility-check.checkers {
  my-custom-config = "com.company.MyCustomJoinConfigCompatChecker"
 }
 ``` 
@@@ note
 Configuration Compatibility Check is enabled by default, but can be disabled by setting `akka.cluster.configuration-compatibility-check.enforce-on-join = off`. This is specially useful when performing rolling updates. Obviously this should only be done if a complete cluster shutdown isn't an option. A cluster with nodes with different configuration settings may lead to data loss or data corruption. 
 This setting should only be disabled on the joining nodes. The checks are always performed on both sides, and warnings are logged. In case of incompatibilities, it is the responsibility of the joining node to decide if the process should be interrupted or not.  
 If you are performing a rolling update on cluster using Akka 2.5.9 or prior (thus, not supporting this feature), the checks will not be performed because the running cluster has no means to verify the configuration sent by the joining node, nor to send back its own configuration.  
@@@ 
 ## Higher level Cluster tools
@@include[cluster.md](../includes/cluster.md) { #cluster-singleton } 
 See @ref:[Cluster Singleton](cluster-singleton.md).
@@include[cluster.md](../includes/cluster.md) { #cluster-sharding }  
 See @ref:[Cluster Sharding](cluster-sharding.md).
@@include[cluster.md](../includes/cluster.md) { #cluster-ddata } 
 See @ref:[Distributed Data](distributed-data.md).
@@include[cluster.md](../includes/cluster.md) { #cluster-pubsub } 
 Classic Pub Sub can be used by leveraging the `.toClassic` adapters.
 See @ref:[Distributed Publish Subscribe in Cluster](../distributed-pub-sub.md). The API is @github[#26338](#26338).
@@include[cluster.md](../includes/cluster.md) { #cluster-multidc }
 See @ref:[Cluster Multi-DC](../cluster-dc.md). The API for multi-dc sharding is @github[#27705](#27705).
--- a/akka-docs/src/main/paradox/typed/failure-detector.md
+++ b/akka-docs/src/main/paradox/typed/failure-detector.md
@ -0,0 +1,72 @@
 # Phi Accrual Failure Detector
 ## Introduction
 Remote DeathWatch uses heartbeat messages and the failure detector to 
 * detect network failures and JVM crashes
 * generate the `Terminated` message to the watching actor on failure 
 * gracefully terminate watched actors  
 The heartbeat arrival times are interpreted by an implementation of
 [The Phi Accrual Failure Detector](https://pdfs.semanticscholar.org/11ae/4c0c0d0c36dc177c1fff5eb84fa49aa3e1a8.pdf) by Hayashibara et al.
 ## Failure Detector Heartbeats
 Heartbeats are sent every second by default, which is configurable. They are performed in a request/reply handshake, and the replies are input to the failure detector.
 The suspicion level of failure is represented by a value called *phi*.
 The basic idea of the phi failure detector is to express the value of *phi* on a scale that
 is dynamically adjusted to reflect current network conditions.
 The value of *phi* is calculated as:
 ```
 phi = -log10(1 - F(timeSinceLastHeartbeat))
 ```
 where F is the cumulative distribution function of a normal distribution with mean
 and standard deviation estimated from historical heartbeat inter-arrival times.
 An accrual failure detector decouples monitoring and interpretation. That makes
 them applicable to a wider area of scenarios and more adequate to build generic
 failure detection services. The idea is that it is keeping a history of failure
 statistics, calculated from heartbeats received from other nodes, and is
 trying to do educated guesses by taking multiple factors, and how they
 accumulate over time, into account in order to come up with a better guess if a
 specific node is up or down. Rather than only answering "yes" or "no" to the
 question "is the node down?" it returns a `phi` value representing the
 likelihood that the node is down.
 The following chart illustrates how *phi* increase with increasing time since the
 previous heartbeat.
 ![phi1.png](../images/phi1.png)
 Phi is calculated from the mean and standard deviation of historical
 inter arrival times. The previous chart is an example for standard deviation
 of 200 ms. If the heartbeats arrive with less deviation the curve becomes steeper,
 i.e. it is possible to determine failure more quickly. The curve looks like this for
 a standard deviation of 100 ms.
 ![phi2.png](../images/phi2.png)
 To be able to survive sudden abnormalities, such as garbage collection pauses and
 transient network failures the failure detector is configured with a margin, which
 you may want to adjust depending on you environment.
 This is how the curve looks like for `failure-detector.acceptable-heartbeat-pause` configured to
 3 seconds.
 ![phi3.png](../images/phi3.png)
 ## Failure Detector Threshold
 The `threshold` that is the basis for the calculation is configurable by the
 user. 
 * A low `threshold` is prone to generate many false positives but ensures
 a quick detection in the event of a real crash. 
 * Conversely, a high `threshold` generates fewer mistakes but needs more time to detect actual crashes. 
 * The default `threshold` is 8 and is appropriate for most situations. However in
 cloud environments, such as Amazon EC2, the value could be increased to 12 in
 order to account for network issues that sometimes occur on such platforms.
--- a/akka-docs/src/main/paradox/typed/index-cluster.md
+++ b/akka-docs/src/main/paradox/typed/index-cluster.md
@ -4,8 +4,9 @@
@@@ index
 * [cluster-specification](../common/cluster.md)
 * [cluster](cluster.md)
 * [cluster-specification](cluster-specification.md)
 * [cluster-membership](cluster-membership.md)
 * [distributed-data](distributed-data.md)
 * [cluster-singleton](cluster-singleton.md)
 * [cluster-sharding](cluster-sharding.md)
@ -17,5 +18,6 @@
 * [remoting-artery](../remoting-artery.md)
 * [remoting](../remoting.md)
 * [coordination](../coordination.md)
 * [choosing-cluster](choosing-cluster.md)
@@@
--- a/project/Paradox.scala
+++ b/project/Paradox.scala
@ -47,6 +47,8 @@ object Paradox {
        "index.html",
        // Page that recommends Alpakka:
        "camel.html",
        // Page linked to from many others, but not in a TOC
        "typed/failure-detector.html",
        // TODO page not linked to
        "fault-tolerance-sample.html"))
		`@ -1,3 +0,0 @@`
			`# Monitoring`

			`Aside from log monitoring and the monitoring provided by your APM or platform provider, [Lightbend Telemetry](https://developer.lightbend.com/docs/telemetry/current/instrumentations/akka/akka.html), available through a [Lightbend Platform Subscription](https://www.lightbend.com/lightbend-platform-subscription), can provide additional insights in the run-time characteristics of your application, including metrics, events, and distributed tracing for Akka Actors, Cluster, HTTP, and more.`