Two issues:
1) ShardRegion actor must stop itself when the node is shutting down,
ie. when receiving MemberRemoved(selfAddress)
2) ShardCoordinator must not persist anything when the node is shutting
down. MemberRemoved of other shard regions will trigger Terminated,
which must not be persisted, because then the next coordinator will
replay those events and end up in wrong state. This is a problem
announced itself when using leaving as illustrated in the new test.
To solve the second issue I have added a new ClusterShuttingDown event
that is published before the MemberRemoved events. Note that Terminated
is triggered by MemberRemoved.
(cherry picked from commit 1b272c72597beece9d93f0054f4b58e3d25f9ae2)
Allow a roleOverride: Option[String] to be used when starting ClusterSharding for a given entry type. This will allow role defined clusters of ClusterSharding for entry types instead of requiring the role configuration to be all or nothing across all entry types.
- Move all entry related logic out of the ShardRegion and into a
new dedicated child `Shard` actor.
- Shard actor persists entry started and passivated messages.
- Non passivated entries get restarted on termination.
- Shard Coordinator restarts shards on other regions upon region failure or handoff
- Ensures shard rebalance restarts shards.
- Shard buffers messages after an EntryStarted is received until state persisted
- Shard buffers messages (still) after a Passivate is received until state persisted
- Shard will retry persisting state until success
- Shard will restart entries automatically (after a backoff) if not passivated and remembering entries
- Added Entry path change to the migration docs
* Add supervisor level that will start the ShardCoordinator again after
a configurable backoff duration
* Make the timeout of SharedLeveldbJournal configurable
* Include cause of PersistenceFailure in message of ActorKilledException