In a recent support case the 'manual cluster join required'
log message caused some confusion.
Turns out the configuration we used to detect whether Cluster
Bootstrap is available has been changed since
https://github.com/akka/akka-management/pull/476
Unfortunately I don't think we can detect whether Cluster
Bootstrap is actually enabled, since users may call
`ClusterBootstrap(system).start()` whenever they like.
Updated the logging to reflect that better.
* Ignore gossip desrialization failures
Only to happen suring a rolling upgrade. Gives us the option to do
incompatible things in Gossip and have the old nodes ignore the
deserialization error.
* Review feedback
* when SBR downs the reachable side (minority) it's important
to quickly inform everybody to shutdown
* send gossip directly to downed node, STONITH signal
* gossip to a few random immediatly when self is downed, which
is always the last from the SBR downing
* enable gossip speedup when there are downed members
* adjust StressSpect to normal again
* adjust TransitionSpect to the new behavior
* Config for when to move to WeaklyUp
* noticed when I was testing with the StressSpec that it's often moving nodes to WeaklyUp
in normal joining scenarios (also seen in Kubernetes testing)
* better to wait some longer since the WeaklyUp will require a new convergence round
and making the full joining -> up take longer time
* changed existing config property to be a duration
* default 7s, previously it was 3s
* on => 7s
* Since DeathWatchNotification is sent over the control channel it may overtake
other messages that have been sent from the same actor before it stopped.
* It can be confusing that Terminated can't be used as an end-of-conversation marker.
* In classic Remoting we didn't have this problem because all messages were sent over
the same connection.
* don't send DeathWatchNotification when system is terminating
* when using Cluster we can rely on that the other side will publish AddressTerminated
when the member has been removed
* it's actually already a race condition that often will result in that the DeathWatchNotification
from the terminating side
* in DeathWatch.scala it will remove the watchedBy when receiving AddressTerminated, and that
may (sometimes) happen before tellWatchersWeDied
* same for Unwatch
* to avoid sending many Unwatch messages when watcher's ActorSystem is terminated
* same race exists for Unwatch as for DeathWatchNotification, if RemoteWatcher publishAddressTerminated
before the watcher is terminated
* config for the flush timeout, and possibility to disable
* adjust default minimum for down-all-when-unstable
* when down-all-when-unstable=on it will be >= 4 seconds
* in case stable-after is tweaked to low value such as 5 seconds
* will be used in rolling update features
* configured with akka.cluster.app-version
* reusing same implementation as ManifestInfo.Version
by moving that to akka.util.Version
* additional version test
* support dynver format, + separator, and commit number
* improve version parser
* lazy parse
* make Member.appVersion internal
* to only exercise membership
* remote deployed routers and supervision of remote deployed actors
are not priority, and that is what is sometimes failing
* Harden multi-dc joining, #29280
* failing test MultiDcJoinSpec
* require that all have seen the gossip seen for the first member in other DC
* the test also revealed that gossip wasn't propagated between DCs when
the VectorClock was the same and only seen is different
* add a SHA-1 disgest of the seen in the GossipStatus to detect that they
are different and that full gossip should be exchanged
* comments
* another test
* mima version
* Since DeathWatchNotification is sent over the control channel it may overtake
other messages that have been sent from the same actor before it stopped.
* It can be confusing that Terminated can't be used as an end-of-conversation marker.
* In classic Remoting we didn't have this problem because all messages were sent over
the same connection.
* don't send DeathWatchNotification when system is terminating
* when using Cluster we can rely on that the other side will publish AddressTerminated
when the member has been removed
* it's actually already a race condition that often will result in that the DeathWatchNotification
from the terminating side
* in DeathWatch.scala it will remove the watchedBy when receiving AddressTerminated, and that
may (sometimes) happen before tellWatchersWeDied
* same for Unwatch
* to avoid sending many Unwatch messages when watcher's ActorSystem is terminated
* same race exists for Unwatch as for DeathWatchNotification, if RemoteWatcher publishAddressTerminated
before the watcher is terminated
* config for the flush timeout, and possibility to disable
* change package name to akka.cluster.sbr
* reference.conf has same config paths
* akka.cluster.sbr.SplitBrainResolverProvider instead of com.lightbend.akka.sbr.SplitBrainResolverProvider
* dependency from akka-cluster to akka-coordination, for lease strategy
* move TestLease to akka-coordination and use that in SBR tests
* remove keep-referee strategy
* use keep-majority by default
* review and adjust reference documentation
Co-authored-by: Johan Andrén <johan@markatta.com>
Co-authored-by: Johannes Rudolph <johannes.rudolph@gmail.com>
Co-authored-by: Christopher Batey <christopher.batey@gmail.com>
Co-authored-by: Arnout Engelen <github@bzzt.net>