Harden restart of Artery stream with inbound-lanes > 1, #23561

* When the artery stream with PartitionHub is restarted it can result in that
  some lanes are removed while it is still processing messages, resulting in
  IndexOutOfBoundsException
* Added possibility to drop messages in PartitionHub, which is then used Artery
* Some race conditions in SurviveInboundStreamRestartWithCompressionInFlightSpec
  when using inbound-lanes > 1
* The killSwitch in Artery was supposed to be triggered when one lane failed,
  but since it used Future.sequence that was never triggered unless it was the
  first lane that failed. Changed to firstCompletedOf.
This commit is contained in:
Patrik Nordwall 2017-08-22 17:59:19 +02:00
parent 9cb5849188
commit fc75f78468
8 changed files with 57 additions and 24 deletions

View file

@ -32,7 +32,8 @@ object SurviveInboundStreamRestartWithCompressionInFlightSpec extends MultiNodeC
akka.loglevel = INFO
akka.remote.artery {
enabled = on
advanced {
advanced {
inbound-lanes = 4
give-up-system-message-after = 4s
compression.actor-refs.advertisement-interval = 300ms
compression.manifests.advertisement-interval = 1 minute
@ -118,15 +119,20 @@ abstract class SurviveInboundStreamRestartWithCompressionInFlightSpec extends Re
}
enterBarrier("inbound-failure-restart-first")
// we poke the remote system, awaiting its inbound stream recovery, when it should reply
awaitAssert(
{
sendToB ! "alive-again"
expectMsg(300.millis, s"${sendToB.path.name}-alive-again")
},
max = 5.seconds, interval = 500.millis)
runOn(second) {
sendToB.tell("trigger", ActorRef.noSender)
// when using inbound-lanes > 1 we can't be sure when it's done, another message (e.g. HandshakeReq)
// might have triggered the restart
Thread.sleep(2000)
// we poke the remote system, awaiting its inbound stream recovery, then it should reply
awaitAssert(
{
sendToB ! "alive-again"
expectMsg(300.millis, s"${sendToB.path.name}-alive-again")
},
max = 5.seconds, interval = 500.millis)
// we continue sending messages using the "old table".
// if a new table was being built, it would cause the b to be compressed as 1 causing a wrong reply to come back
1 to 100 foreach { i pingPong(sendToB, s"b$i") }