* The ThreadLocal Serialization.currentTransportInformation is used for serializing local
actor refs, but it's also useful when a serializer library e.g. custom serializer/deserializer
in Jackson need access to the current ActorSystem.
* We set this in a rather ad-hoc way from remoting and in some persistence plugins, but it's only
set for serialization and not deserialization, and it's easy for Persistence plugins or other
libraries to forget this when using Akka serialization directly.
* This change is automatically setting the info when using the ordinary serialize and deserialize
methods.
* It's also set when LocalActorRefProvider, which wasn't always the case previously.
* Keep a cached instance of Serialization.Information in the provider to avoid
creating new instances all the time.
* Added optional Persistence TCK tests to verify that the plugin is setting this
if it's using some custom calls to the serializer.
Each build is now over 40mb logs.
A lot of DEBUG logging was left on for test failures that have been
fixed. Added an issue # for ones that are still valid or if if it on
as the test verifies debug
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
* make sure compressions for quarantined are removed in case they are lingering around
* also means that advertise will not be done for quarantined
* remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
* it was fixed by the other previous changes
* also confirmed by running the SimpleClusterApp with TCP
as described in the ticket
* test with tcp and tls-tcp transport
* handle the stop signals differently for tcp transport because they
are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
* note that shallow Association instances may still lingering in the
heap because of cached references from RemoteActorRef, which may
be cached by LruBoundedCache (used by resolve actor ref).
Those are small, since the queues have been removed, and the cache
is bounded.
* transport config
* TCP specific classes in akka.remote.artery.tcp package
* TcpFraming stage that handle the additional streamId field and length based framing.
Credit to jrudolph for this clean solution, which made it possible to use same
envolope header for Aeron and TCP.
* magic first bytes to detect invalid access
* drain SendQueue to deadLetters in postStop
* error handling, restart, inbound and outbound streams
* udp vs tcp in autoSelectPort
* TCP specific flight recorder events
* update reference documentation
* Refactoring to separate the Aeron specific things, ArteryAeronUdpTransport
* move Aeron specific classes to akka.remote.artery.aeron package
* move Version to ArterySettings, and describe strategy for envelope header changes
This test is using the tcp ports assigned from the multi-jvm infra (classic remoting)
for the Aeron udp. Even though tcp and udp ports are independent and same port number
can be used at the same time for tcp and udp there is no guarantee that the udp port
is free just because the tcp port was.
* DaemonMsgCreate is not a system message. We send it over the control
stream because remote deployment process depends on message ordering
for DaemonMsgCreate and Watch messages. That is all good.
* We also send DaemonMsgCreate over the ordinary message stream (all
outbound lanes) so that the first ordinary message that is sent to
the ref does not arrive before the actor is created. This is not needed,
since the retried resolve in the Decoder will take care of that anyway.
* Inbound lanes were not covered, but not needed.
* Then the deduplication of DaemonMsgCreate messages in RemoteSystemDaemon
is not needed.
* Added some more tests for these things.
* describe lanes in reference docs
* When the artery stream with PartitionHub is restarted it can result in that
some lanes are removed while it is still processing messages, resulting in
IndexOutOfBoundsException
* Added possibility to drop messages in PartitionHub, which is then used Artery
* Some race conditions in SurviveInboundStreamRestartWithCompressionInFlightSpec
when using inbound-lanes > 1
* The killSwitch in Artery was supposed to be triggered when one lane failed,
but since it used Future.sequence that was never triggered unless it was the
first lane that failed. Changed to firstCompletedOf.
* Retry creation of ActorSystem in remoting tests #23481
Remoting multi-jvm test rely on setting port = 0 which selects an open
port. This has a race where two of the JVMs open/close the same port
then configure their ActorSystem with it so one of them fails to start
due to the port being in use. This adds a simple retry so another port
is selected.
* Looked into the alternativies described in the ticket, but
they were complicated so ended up with simply including the
uid of the sending system in the hash for selecting inbound
lane. That should be good enough, until we have any real demand
for something else.
* This means that different lanes can be used ActorSelection messages
from different sending systems, i.e. good in a cluster, but same lane
will be used for all messages originating from the same system.
* Added possibility to run the benchmarks with ActorSelection
* Added ActorSelection to the send consistency test
Reproducer (TransportFailSpec):
* watch from first to second node, i.e. sys msg with seq number 1
* trigger transport failure detection to tear down the connection
* the bug was that on the second node the ReliableDeliverySupervisor
was stopped because the send buffer had not been used on that side,
but that removed the receive buffer entry
* later, after gating elapsed another watch from first to second node,
i.e. sys msg with seq number 2
* when that watch msg was received on the second node the receive buffer
had been cleared and therefore it thought that seq number 1 was missing,
and therefore sent nack to the first node
* when first node received the nack it thrown
IllegalStateException: Error encountered while processing system message
acknowledgement buffer: [2 {2}] ack: ACK[2, {1, 0}]
caused by: ResendUnfulfillableException: Unable to fulfill resend request since
negatively acknowledged payload is no longer in buffer
This was fixed by not stopping the ReliableDeliverySupervisor so that the
receive buffer was preserved.
Not necessary for fixing the issue, but the following config settings were adjusted:
* increased transport-failure-detector timeout to avoid tearing down the
connection too early
* reduce the quarantine-after-silence to cleanup ReliableDeliverySupervisor
actors earlier
When failing we observed a second "PostStop" message.
This was a "PostStop" for the new, restarted actor, likely due to the newly
restarted ActorSystem being terminated at the end of the test.