Adds some level of cluster awareness to both the LeastShardAllocationStrategy implementations:
* #27368 prefer shard allocations on new nodes during rolling updates
* #27367 don't rebalance during rolling update
* #29554 don't rebalance when there are joining nodes
* #29553 don't allocate to leaving, downed, exiting and unreachable nodes
* When allocating when there are joining, unreachable, are leaving are de-prioritized to decrease the risk that a shard is allocated just to directly need to be re-allocated on a different node.
* The rebalance in the LeastShardAllocationStrategy is only comparing the region
with most shards with the one with least shards. Makes the rebalance rather
slow. By default it's only rebalancing 1 shard at a time.
* This new strategy looks at all current allocations to find the optimal
number of shards per region and tries to adjust towards that value.
Picking from all regions with more shards than the optimal.
* Absolute and relative limit on how many shards that can be rebalanced
in one round.
* It's also not starting a new rebalance round until the previous has
completed.
* unit tests
* second phase for fine grained rebalance, due to rounding it will not be perfect in the first phase
* randomized unit test
* configuration settings
* docs
* Forward terminated from ShardCoordinator to RebalanceWorker
Avoiding the need for rebalance workers to watch shard regions which is
expensive as there is one rebalance worker per shard
* Review feedback
* Allow entities to stop by terminating in sharding without remember entities #29383
We missed an allowed transition from running/active to stopped/NoState in shard.
when the logic was rewritten.
* Add a toggle to opt-in crash shard on illegal state transitions
Default is logging an error and not crashing shard and all other entities, our tests have the toggle enabled.
* A fix for passivation when not using remember entities fixing #29359 and possibly #27549
* when using down-removal-margin it could allocate to an
already terminated region
* the watch fix in PR #29092 solves this
* this is an "optimization" to avoid the regions that
have been terminated
* change package name to akka.cluster.sbr
* reference.conf has same config paths
* akka.cluster.sbr.SplitBrainResolverProvider instead of com.lightbend.akka.sbr.SplitBrainResolverProvider
* dependency from akka-cluster to akka-coordination, for lease strategy
* move TestLease to akka-coordination and use that in SBR tests
* remove keep-referee strategy
* use keep-majority by default
* review and adjust reference documentation
Co-authored-by: Johan Andrén <johan@markatta.com>
Co-authored-by: Johannes Rudolph <johannes.rudolph@gmail.com>
Co-authored-by: Christopher Batey <christopher.batey@gmail.com>
Co-authored-by: Arnout Engelen <github@bzzt.net>
* Possibility to prefer oldest in ddata writes and reads
* enabled for Cluster Sharding
* New ReadMajorityPlus and WriteMajorityPlus
* used by Cluster Sharding, with configuration
* also possible to define ReadAll in config
(cherry picked from commit 4ba835d328)
* Add scalafix plugin for jdk 9.
* Add command alias sortImports.
* Excludes some sources from SortImports.
* Update SortImports to 0.4.0
* Sort imports with `sortImports` command.
* scalafix ExplicitNonNullaryApply prepare
+ Temporarily use com.sandinh:sbt-scalafix because scalacenter/scalafix#1098
+ Add ExplicitNonNullaryApply rule to .scalafix.conf
+ Manually fix a NonNullaryApply case in DeathWatchSpec that cause
`fixall` fail because ExplicitNonNullaryApply rule incorrectly rewrite
`context unbecome` to `context unbecome()` instead of `context.unbecome()`
* scalafix ExplicitNonNullaryApply
fix by enabling only ExplicitNonNullaryApply rule in .scalafix.conf then:
```
% sbt -Dakka.build.scalaVersion=2.13.1
> fixall
```
* scalafmtAll
* Revert to ch.epfl.scala:sbt-scalafix
Co-authored-by: Bùi Việt Thành <thanhbv@sandinh.net>
* Possibility to prefer oldest in ddata writes and reads
* enabled for Cluster Sharding
* New ReadMajorityPlus and WriteMajorityPlus
* used by Cluster Sharding, with configuration
* also possible to define ReadAll in config
* DData and Persistence based remember entitites refactored
* Order methods in the order of init in the shard.
* Some bad isolation between test cases causing problems
* Test coverage for remember entities store failures
* WithLogCapturing where applicable
* MiMa filters
* Timeouts from config for persistent remember entities
* Single method for deliver, less utf-8 encoding
* Include detail on write failure
* Don't send message to dead letter if it is actually handled in BackOffSupervisor
* Back off supervisor log format plus use warning for hitting max restarts
* actor/message based spi
* Missing assert that node had joined cluster
* Keep track of Leaving and Exiting members in ShardRegion and
attempt to register to coordinator at several of the oldest if
they have status Leaving and Exiting. Include all up to and
including the first member with status Up.
* Sending to wrong node doesn't matter, will be suppressed deadLetter.
* Same for the GracefulShutdownReq which already had that intention by
sending to 2 oldest.
* Remove use of getClass in secondary constructors
As this is not allowed anymore on newer versions of scala, and likely
didn't work correctly in the past either
This might make selecting unique actor system names for test
actor systems a bit less reliable, but that didn't seem to be
critical anyway.
Thanks to @som-snytt for the heads-up and initial implementation
in #28353
* Avoid AkkaSpec.getCallerName in MultiNodeClusterShardingConfig
* StreamSpec can be abstract
* Use more sophisticated test class name logic
* scalafmt