Avoid false removals in ClusterReceptionist, #26284

* The scenario was (probably) that a node was restarted with
  same host:port and then didn't join the same cluster. The DData
  Replicator in the original cluster would continue sending messages
  to the new incarnation resulting in false removals.
* The fix is that DData Replicator includes the system uid of the sending
  or target system in messages and if recipient gets a message that is from/to
  unknown it will discard it and thereby not spreading information across
  different clusters.
* Reproduced in ClusterReceptionistSpec
* Much hardening of other things in ClusterReceptionistSpec
* There are also some improvements to ClusterReceptionist to not leak
  Listing with refs of removed nodes.
* use ClusterShuttingDown
* The reason for using sender system uid instead of target uid in messages
  like Read and Write is that then the optimization for sending same message
  to many destinations can remain.
This commit is contained in:
Patrik Nordwall 2019-02-21 09:09:20 +01:00
parent 3cbda93496
commit 825d90bf63
16 changed files with 1714 additions and 396 deletions

View file

@ -50,14 +50,14 @@ private object ReplicatedDataSerializer {
@silent
private final def compareKeys(t1: Any, t2: Any): Int = (t1, t2) match {
case (k1: String, k2: String) => k1.compareTo(k2)
case (k1: String, k2) => -1
case (k1, k2: String) => 1
case (_: String, _) => -1
case (_, _: String) => 1
case (k1: Int, k2: Int) => k1.compareTo(k2)
case (k1: Int, k2) => -1
case (k1, k2: Int) => 1
case (_: Int, _) => -1
case (_, _: Int) => 1
case (k1: Long, k2: Long) => k1.compareTo(k2)
case (k1: Long, k2) => -1
case (k1, k2: Long) => 1
case (_: Long, _) => -1
case (_, _: Long) => 1
case (k1: OtherMessage, k2: OtherMessage) => OtherMessageComparator.compare(k1, k2)
case (k1, k2) =>
throw new IllegalStateException(