* Add CopyrightHeader support for sbt-boilerplate plugin.
* Add CopyrightHeader support for `*.proto` files.
* Add regex match for both `–` and `-` for CopyrightHeader.
* Add CopyrightHeader support for sbt build files.
* Update copyright from 2018 to 2019.
The intended behavior is:
* don't gossip to node marked as unreachable by self (heartbeat
messages are not getting through so no point in trying to gossip).
* gossip is allowed to nodes marked as unreachable by others
This doesn't change anything from how it worked before, but I think the
original intention before the multi-dc changes was to not gossip to
unreachable at all no matter who marked them.
* Testing of singleton leaving
* gossip optimization, exiting change to two oldest per role
* hardening ClusterSingletonManagerIsStuck restart, increase ClusterSingletonManagerIsStuck
* Introduce 'MemberDowned' member event
Compatiblity note: MemberEvent is a sealed trait, so it is debatable whether
it is acceptable to introduce a new member.
* Be more conservative (more like leaving), add test
* Fixes#25489 where cluster event for a previous state can override
the call to cluster.close settings it to remove
* Fix case where Removed is used as a placeholder for unknown
* Detect that joining node is 2.5.9 or earlier by empty ConfigCheck
config in InitJoin message. Then send back Address, which was the
old representation of InitJoinAck
* Include akka.version in logging to facilitate troubleshooting
Default is 5s which means if the first Read is lost and
a test ddata have any secondary nodes to query it'll
timeout waiting to get the state.
E.g. read being ignored due to loading durable state then
never gets retries
* fix NPE in shutdownTransport
* perhaps because shutdown before started
* system.dispatcher is used in other places of the shutdown
* improve logging of compression advertisment progress
* adjust RestartFlow.withBackoff parameters
* quarantine after ActorSystemTerminating signal
(will cleanup compressions)
* Quarantine idle associations
* liveness checks by sending extra HandshakeReq and update the
lastUsed when reply received
* concervative default value to survive network partition, in
case no other messages are sent
* Adjust logging and QuarantinedEvent for harmless quarantine
* Harmless if it was via the shutdown signal or cluster leaving
Each build is now over 40mb logs.
A lot of DEBUG logging was left on for test failures that have been
fixed. Added an issue # for ones that are still valid or if if it on
as the test verifies debug
* Notice that the incarnation has changed in SystemMessageDelivery
and then reset the sequence number
* Take the incarnation number into account in the ClearSystemMessageDelivery
message
* Trigger quarantine earlier in ClusterRemoteWatcher if node with
same host:port joined
* Change quarantine-removed-node-after to 5s, shouldn't be necessary
to delay it 30s
* test reproducer
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
* make sure compressions for quarantined are removed in case they are lingering around
* also means that advertise will not be done for quarantined
* remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
* it was fixed by the other previous changes
* also confirmed by running the SimpleClusterApp with TCP
as described in the ticket
* test with tcp and tls-tcp transport
* handle the stop signals differently for tcp transport because they
are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
* note that shallow Association instances may still lingering in the
heap because of cached references from RemoteActorRef, which may
be cached by LruBoundedCache (used by resolve actor ref).
Those are small, since the queues have been removed, and the cache
is bounded.
* Refactoring to separate the Aeron specific things, ArteryAeronUdpTransport
* move Aeron specific classes to akka.remote.artery.aeron package
* move Version to ArterySettings, and describe strategy for envelope header changes
Test would fail picking up the reachable from the previous unsplit
as it is a new probe.
Also change barrierCounter to split/unsplit so easier to see
where the failure is on a barrier fail
* When leaving/downing the last node in a DC it would not
be removed in another DC, since that was only done by the
leader in the owning DC (and that is gone).
* It should be ok to eagerly remove such nodes also by
leaders in other DCs.
* Note that gossip is already sent out so for the last node
that will be spread to other DC, unless there is a network
partition. For that we can't do anything. It will be replaced
if joining again.
There exists a race where a cluter node that is being downed seens its
self as the oldest node (as it has had the other nodes removed) and it
takes over the singleton manager sending the real oldest node to go into
the End state meaning that cluster singletons never work again.
This fix simply prevents Member events being given to the Cluster
Manager FSM during a shut down, instread relying on SelfExiting.
This also hardens the test by not downing the node that the current
sharding coordinator is running on as well as fixing a bug in the
probes.