Reproducer (TransportFailSpec):
* watch from first to second node, i.e. sys msg with seq number 1
* trigger transport failure detection to tear down the connection
* the bug was that on the second node the ReliableDeliverySupervisor
was stopped because the send buffer had not been used on that side,
but that removed the receive buffer entry
* later, after gating elapsed another watch from first to second node,
i.e. sys msg with seq number 2
* when that watch msg was received on the second node the receive buffer
had been cleared and therefore it thought that seq number 1 was missing,
and therefore sent nack to the first node
* when first node received the nack it thrown
IllegalStateException: Error encountered while processing system message
acknowledgement buffer: [2 {2}] ack: ACK[2, {1, 0}]
caused by: ResendUnfulfillableException: Unable to fulfill resend request since
negatively acknowledged payload is no longer in buffer
This was fixed by not stopping the ReliableDeliverySupervisor so that the
receive buffer was preserved.
Not necessary for fixing the issue, but the following config settings were adjusted:
* increased transport-failure-detector timeout to avoid tearing down the
connection too early
* reduce the quarantine-after-silence to cleanup ReliableDeliverySupervisor
actors earlier
* We used the Array based toBinary but the ByteBuffer based fromBinary.
and IntSerializer is only using the same format for those when the byte order
is LITTLE_ENDIAN, which we didn't get from protbuf's asReadOnlyByteBuffer
* We can use the Array based methods in DaemonMsgCreateSerializer,
performance is not important here
* Added some more testing in PrimitivesSerializationSpec
* WIP early preview of moving compressions ownership to Decoder
* Compression table created in transport, but owned by Decoder
Added test for restart of inbound stream
* =art snapshot not needed in HeavyHitters since owned by Decoder
Benchmarks revealed that busy spinning directly in the graph stage can
lead to an excessive increase in latency when multiple inbound lanes are
active (i.e. the inbound flow has an asynchronous boundary driving the
multiple lanes).
The new strategy is therefore:
For inbound-lanes > 1 or idle-cpu-level < 5: no spinning in the graph stage
For inbound-lanes = 1 and idle-cpu-level >= 6: 50 * settings.Advanced.IdleCpuLevel - 240
which means in general much less or no spinning at all.
Fixes#21365.
* to be able to introduce new messages and still support rolling upgrades,
i.e. a cluster of mixed versions
* note that it's only catching NotSerializableException, which we already
use for unknown serializer ids and class manifests
* note that it is not catching for system messages, since that could result
in infinite resending
* speedup ActorCreationPerfSpec
* reduce iterations in ConsistencySpec
* tag SupervisorHierarchySpec as LongRunningTest
* various small speedups and tagging in actor-tests
* speedup expectNoMsg in stream-tests
* tag FramingSpec, and reduce iterations
* speedup QueueSourceSpec
* tag some stream-tests
* reduce iterations in persistence.PerformanceSpec
* reduce iterations in some cluster perf tests
* tag RemoteWatcherSpec
* tag InterpreterStressSpec
* remove LongRunning from ClusterConsistentHashingRouterSpec
* sys property to disable multi-jvm tests in test
* actually disable multi-node tests in validatePullRequest
* doc sbt flags in CONTRIBUTING
* Minor cleanup in version calculation
* Table versions before tag-fields
* Single handling block of unknown compression table version #21580
* Compression table versions use -1 only as special number #21448
* Align to 4 byte boundaries in header
* also support for serialization of exceptions, see
comment in reference.conf
* extract Throwable and Payload methods to helper classes
* add security checks before creating instance from class name