Each build is now over 40mb logs.
A lot of DEBUG logging was left on for test failures that have been
fixed. Added an issue # for ones that are still valid or if if it on
as the test verifies debug
* Some GetShardHome requests were ignored (by design) during
rebalance and they would be retried later.
* This optimization keeps tracks of such requests and reply
to them immediately after rebalance has been completed and
thereby the buffered messages in the region don't have to
wait for next retry tick.
* use regionTerminationInProgress also during the update since
all GetShardHome requests are not stashed
* Revert "fix entityPropsFactory id param, #21809"
This reverts commit cd7eae28f6.
* Revert "Merge pull request #24058 from talpr/talpr-24053-add-entity-id-to-sharding-props"
This reverts commit 8417e70460, reversing
changes made to 22e85f869d.
AFAICT there was nothing ensuring the order of messages when sent to the
shard and the region so first checkthat the passivation has happened
before sending another add in the test
Refs #24013
* Sharding only within own team (coordinator is singleton)
* the ddata Replicator used by Sharding must also be only within own team
* added support for Set of roles in ddata Replicator so that can be used
by sharding to specify role + team
* Sharding proxy can route to sharding in another team
* Test case covering changing shard id extractor with remember-entities
* This should do the trick
* Feedback addressed
* Docs and migration guide mention
* Correct logic to persist that entity has moved off off shard
* when using remember entities with ddata mode the set of
shards were not saved in durable storage and therefore the
remembered entities were not loaded until the first message
was sent to the shard
* the coordinator stores the set of shards in a durable GSet
* loaded when the coordinator is started and added to the State,
rest is already taken care of via the unallocatedShards Set in
the State
* when new shards are allocated the durable GSet is updated if it
doesn't already contain the shard identifier
* Lazy init of LmdbDurableStore, #22759
* to avoid creating files (and initializing db) when not needed,
e.g. cluster sharding that is not using remember entities
* enable MiMa against 2.5.0
* use OptionVal instead
* one Replicator per configured role
* log LMDB directory at startup
* clarify the imporantce of the LMDB directory
* use more than one key to support many entities
* CoordinatedShutdown that can run tasks for configured phases in order (DAG)
* coordinate handover/shutdown of singleton with cluster exiting/shutdown
* phase config obj with depends-on list
* integrate graceful leaving of sharding in coordinated shutdown
* add timeout and recover
* add some missing artery ports to tests
* leave via CoordinatedShutdown.run
* optionally exit-jvm in last phase
* run via jvm shutdown hook
* send ExitingConfirmed to leader before shutdown of Exiting
to not have to wait for failure detector to mark it as
unreachable before removing
* the unreachable signal is still kept as a safe guard if
message is lost or leader dies
* PhaseClusterExiting vs MemberExited in ClusterSingletonManager
* terminate ActorSystem when cluster shutdown (via Down)
* add more predefined and custom phases
* reference documentation
* migration guide
* problem when the leader order was sys2, sys1, sys3,
then sys3 could not perform it's duties and move Leving sys1 to
Exiting because it was observing sys1 as unreachable
* exclude Leaving with exitingConfirmed from convergence condidtion
* In the logs of the failing test we can see that the first node is removed
as expected and then come back in the membership, which is possible in
case of conflicting membership state merge. It is supposed to be
removed again by the auto-down. That doesn't happen within the barrier-timeout.
* Provide shorter aliases for the ActorRefProviders #20649
* Use the new actorefprovider aliases throughout code and docs
* Cleaner alias replacement logic
Previously a failure during e.g. MailboxType.create() would make the
user guardian fail, tearing down the whole system as a result. The cause
is a deep bug in handling ActorCell creation that we cannot really fix
anymore due to resulting changes in semantics, hence this fix only
targets top-level actors (where the observable difference is an
unambiguous improvement).
fixes#15947