Commit graph

28 commits

Author SHA1 Message Date
Patrik Nordwall
7fc7744049 Quarantine and cleanup idle associations, #24972
* fix NPE in shutdownTransport
  * perhaps because shutdown before started
  * system.dispatcher is used in other places of the shutdown
* improve logging of compression advertisment progress
* adjust RestartFlow.withBackoff parameters
* quarantine after ActorSystemTerminating signal
  (will cleanup compressions)
* Quarantine idle associations
  * liveness checks by sending extra HandshakeReq and update the
    lastUsed when reply received
  * concervative default value to survive network partition, in
    case no other messages are sent
* Adjust logging and QuarantinedEvent for harmless quarantine
  * Harmless if it was via the shutdown signal or cluster leaving
2018-05-22 13:10:30 +02:00
Patrik Nordwall
43dc381d59
Clear system messages sequence number for restarted node, #24847
* Notice that the incarnation has changed in SystemMessageDelivery
  and then reset the sequence number
* Take the incarnation number into account in the ClearSystemMessageDelivery
  message
* Trigger quarantine earlier in ClusterRemoteWatcher if node with
  same host:port joined
* Change quarantine-removed-node-after to 5s, shouldn't be necessary
  to delay it 30s
* test reproducer
2018-04-10 11:39:55 +02:00
Konrad `ktoso` Malawski
563c7fbcf0 Issue 24594: Integration with sbt-headers and initial header population 2018-03-13 15:45:55 +01:00
Christopher Batey
009214ae07
Update copyright to 2018 (#24241) 2018-01-04 17:26:29 +00:00
Philippus Baalman
6c7085252a extended copyright into 2017 2017-01-04 17:37:15 +01:00
Patrik Nordwall
136e64b253 use longUid in ClusterRemoteWatcher, #21594
* found by test failure in SurviveNetworkInstabilitySpec
2016-09-30 10:51:51 +02:00
Johan Andrén
0f376e751e Quarantine gracefully downed node after some time (#21534)
* New setting for quarantining after graceful leave
2016-09-28 14:04:58 +02:00
Johan Andrén
8ae0c9a888 Use long uid in artery remoting and cluster #20644 2016-09-26 15:34:59 +02:00
Patrik Nordwall
f1590a59b4 revert quarantine removed (leaving) cluster member, #21509 2016-09-21 17:27:34 +02:00
Patrik Nordwall
1926560e41 stop outbound streams when quarantined, #21407
* they can't be stopped immediately because we want to send
  some final message and we reply to inbound messages with `Quarantined`
* and improve logging
2016-09-21 14:38:13 +02:00
Patrik Nordwall
3b7a7dfa59 add reason param to quarantine method 2016-09-08 18:00:37 +02:00
Björn Antonsson
c66ce62d63 Update to a working version of Scalariform 2016-06-02 22:12:36 +02:00
Johan Andrén
62e30b3c08 Update copyrights and links to the new company name #19851 2016-02-23 12:58:39 +01:00
Prayag Verma
b7783968a0 =pro #19068 All copyrights ranges and single years updated to a range ending in 2016 2016-01-25 10:20:30 +01:00
Patrik Nordwall
c7c187f6b7 =clu replace Set -- with diff and ++ with union
* better performance according to
  https://docs.google.com/presentation/d/1Qjryxoe-fYEM8ZPhM-98LKfbhnRcn5eAEMNlVVnixsA/pub
2015-11-06 14:48:17 +01:00
Veiga Ortiz, Héctor
c08bc317e2 +clu #13584 Accept joining to be WeaklyUp during network split
* experimental feature, disabled by default
* Adding documentation to mention weakly up members.
  plus adding new diagram.
2015-09-04 12:44:47 +02:00
Thibaut Robert
12cbf83927 =rem improve remote watching mechanism
This improves the remote watching mechanism as follows: Watch requests
are intercepted by the RemoteWatcher and not sent on the wire,
excepted watches from the remoteWatcher itself.

RemoteWatcher is then in charge of forwarding DeathWatchNotification
messages to the watchers.

This reduces the number of watch message to one per watchee, even if
there are several watcher on the same watchee (instead of n+1 before).

Reversed watch messages, and watch on ref with undefinedUid are excluded from
interception by the RemoteWatcher and so are handled as before this commit.

In addition, the following changes are made:
- Keep watchers in a map watchee -> watchers for more efficient retrieval
(in a scala Multimap)
- Keep watchees in a map address -> watchee for more efficient retrieval
(in a scala Multimap)
- Use of InternalActorRef more thoroughly to avoid casts
- Rewatch use a standard watch message, as the distinction is longer needed
2015-05-13 14:10:35 +02:00
Julian Tescher
00f6a58e7c Changes all occurances of Typesafe copyright to extend to 2015 2015-03-10 14:12:19 -07:00
Patrik Nordwall
aa6bdd197e =rem #3870 Stop re-delivery of system messages when watching non-existing sys
* We don't know the UID so we can't quarantine, but we can stop the endpoint
  writer and drop outstanding system messages
2014-02-13 11:27:53 +01:00
Adam Voss
cce29dfa51 Changes all occurances of Typesafe copyright to extend to 2014. 2014-02-04 21:20:09 -06:00
Patrik Nordwall
07baf05bae harmonize MyActor.props pattern, see #3418 2013-05-30 14:50:46 +02:00
Patrik Nordwall
ee6e80d31a Add previousStatus in MemberRemoved, see #3252 2013-05-23 11:09:32 +02:00
Patrik Nordwall
7628889b43 Changed design of RemoteWatcher due to cleanup race, see #3265
* The problem was a race caused by HeartbeatReq sent out, and
  the watchee terminated immediately. That caused the RemoteWatcher
  peers watching each other without any other watch registered.
  It is racy.
* Instead of one-way heartbeats from the side beeing watched I
  changed to ping-pong style. That makes the problem go away
  and simplifies a lot of things in RemoteWatcher.
2013-05-04 17:35:12 +02:00
Patrik Nordwall
551e2d1321 Stop heartbeating when watching node crash, see #3265 2013-04-25 21:25:46 +02:00
Patrik Nordwall
49744e0b0f Only quarantine removed member that was unreachable, see #2594
* For graceful leaving and remove it should still be possible to
  communicate with the node after cluster removal.
* Otherwise the hand over in cluster singleton would break, for
  example.
* Also, skip selfAddress to avoid generation of AddressTerminated
  for the own node when removed from cluster.
2013-04-19 08:52:27 +02:00
Patrik Nordwall
58bd0a1460 Quarantine from ClusterRemoteWatcher also, see #2993
* This was an oversight in previous pull request
2013-04-18 16:55:02 +02:00
Patrik Nordwall
ee1b5879cf Race between termination and actorFor, see #3234 2013-04-17 21:03:16 +02:00
Patrik Nordwall
4606612bd1 Reliable remote supervision and death watch, see #2993
* RemoteWatcher that monitors node failures, with heartbeats
  and failure detector
* Move RemoteDeploymentWatcher from CARP to RARP
* ClusterRemoteWatcher that handles cluster nodes
* Update documentation
* UID in Heartbeat msg to be able to quarantine,
  actual implementation of quarantining will be implemented
  in ticket 2594
2013-04-17 19:42:51 +02:00