* Test case covering changing shard id extractor with remember-entities
* This should do the trick
* Feedback addressed
* Docs and migration guide mention
* Correct logic to persist that entity has moved off off shard
* when using remember entities with ddata mode the set of
shards were not saved in durable storage and therefore the
remembered entities were not loaded until the first message
was sent to the shard
* the coordinator stores the set of shards in a durable GSet
* loaded when the coordinator is started and added to the State,
rest is already taken care of via the unallocatedShards Set in
the State
* when new shards are allocated the durable GSet is updated if it
doesn't already contain the shard identifier
* Lazy init of LmdbDurableStore, #22759
* to avoid creating files (and initializing db) when not needed,
e.g. cluster sharding that is not using remember entities
* enable MiMa against 2.5.0
* use OptionVal instead
* one Replicator per configured role
* log LMDB directory at startup
* clarify the imporantce of the LMDB directory
* use more than one key to support many entities
* CoordinatedShutdown that can run tasks for configured phases in order (DAG)
* coordinate handover/shutdown of singleton with cluster exiting/shutdown
* phase config obj with depends-on list
* integrate graceful leaving of sharding in coordinated shutdown
* add timeout and recover
* add some missing artery ports to tests
* leave via CoordinatedShutdown.run
* optionally exit-jvm in last phase
* run via jvm shutdown hook
* send ExitingConfirmed to leader before shutdown of Exiting
to not have to wait for failure detector to mark it as
unreachable before removing
* the unreachable signal is still kept as a safe guard if
message is lost or leader dies
* PhaseClusterExiting vs MemberExited in ClusterSingletonManager
* terminate ActorSystem when cluster shutdown (via Down)
* add more predefined and custom phases
* reference documentation
* migration guide
* problem when the leader order was sys2, sys1, sys3,
then sys3 could not perform it's duties and move Leving sys1 to
Exiting because it was observing sys1 as unreachable
* exclude Leaving with exitingConfirmed from convergence condidtion
* In the logs of the failing test we can see that the first node is removed
as expected and then come back in the membership, which is possible in
case of conflicting membership state merge. It is supposed to be
removed again by the auto-down. That doesn't happen within the barrier-timeout.
* Provide shorter aliases for the ActorRefProviders #20649
* Use the new actorefprovider aliases throughout code and docs
* Cleaner alias replacement logic
Previously a failure during e.g. MailboxType.create() would make the
user guardian fail, tearing down the whole system as a result. The cause
is a deep bug in handling ActorCell creation that we cannot really fix
anymore due to resulting changes in semantics, hence this fix only
targets top-level actors (where the observable difference is an
unambiguous improvement).
fixes#15947
Two new message pairs:
`GetShardRegionState`/`CurrentShardRegionState` allows for querying a region for it's current shards and the current `EntityIds` of it
`GetClusterShardingStats`/`ClusterShardingStats` allows for querying the entire cluster for a summary of
the number of entitites alive in each region and shard.
For manual downing it is not needed. For auto-down it doesn't add any extra safety, since that
is not handling network partitions anyway.
The setting is still useful if you implement downing strategies that handle network partitions,
e.g. by keeping the larger side of the partition and shutting down the smaller side.
Two improvements to the coordinator startup (state recovery) that
should make it operational faster and reduce the amount of lost messages
during startup.
* Let the quick (those not involving failure detection) Terminated messages
be processed before starting to reply to GetShardHome.
* Consider regions that don't belong to the current cluster
to be terminated.
* avoid using Down and Exiting member from being used for joining
* delay shut down of Down member until the information is spread
to all reachable members, e.g. downing several nodes via one node
* akka.cluster.down-removal-margin setting
Margin until shards or singletons that belonged to a
downed/removed partition are created in surviving partition.
Used by singleton and sharding.
* remove the retry count parameters/settings for singleton in
favor of deriving those from the removal-margin