Commit graph

93 commits

Author SHA1 Message Date
Patrik Nordwall
ce329e48c1 =clu #3660 Simple speedup of gossip in early phase 2013-10-14 22:16:40 +02:00
Patrik Nordwall
d3f295e5fe Merge pull request #1738 from akka/wip-3612-join-self-patriknw
+clu #3612 Allow join to uninitialized node
2013-09-29 22:41:15 -07:00
Patrik Nordwall
cb42bf0785 +clu #3612 Allow join to uninitialized node
* join to self not needed when performing manual joining
2013-09-27 14:40:09 +02:00
Patrik Nordwall
d5b25cbbc6 !act #3583 Timer based auto-down
* Replace (deprecate) akka.cluster.auto-down config setting with
  akka.cluster.auto-down-unreachable-after
* AutoDown actor that keeps track of unreachable members
  and performs down from the leader node when they have been
  unreachable for the specified duration
* Migration guide
2013-09-27 14:32:03 +02:00
Patrik Nordwall
dc9fe4f19c !clu #2307 Allow transition from unreachable to reachable
* Replace unreachable Set with Reachability table
* Unreachable members stay in member Set
* Downing a live member was moved it to the unreachable Set,
  and then removed from there by the leader. That will not
  work when flipping back to reachable, so a Down member must
  be detected as unreachable before beeing removed. Similar
  to Exiting. Member shuts down itself if it sees itself as
  Down.
* Flip back to reachable when failure detector monitors it as
  available again
* ReachableMember event
* Can't ignore gossip from aggregated unreachable (see SurviveNetworkInstabilitySpec)
* Make use of ReachableMember event in cluster router
* End heartbeat when acknowledged, EndHeartbeatAck
* Remove nr-of-end-heartbeats from conf
* Full reachability info in JMX cluster status
* Don't use interval after unreachable for AccrualFailureDetector history
* Add QuarantinedEvent to remoting, used for Reachability.Terminated
* Prune reachability table when all reachable
* Update documentation
* Performance testing and optimizations
2013-09-11 13:10:29 +02:00
Björn Antonsson
bbad92c749 !clu #2320 Convert the seen table into something more efficient 2013-09-06 10:18:13 +02:00
Endre Sándor Varga
b566e9393d =act, rem, clu #3521: make serialize-messages work with core modules 2013-08-27 11:05:54 +02:00
Patrik Nordwall
4323a64183 =clu #3546 Change log level of gossip from unknown
* It is pretty normal when joining so users should not be worried
* Change to debug level
2013-08-16 15:29:05 +02:00
Patrik Nordwall
30d34e20bf Make Make Cluster.joinSeedNodes public, see #3468 2013-06-24 12:15:22 +02:00
Björn Antonsson
46966c25ea Merge pull request #1535 from akka/wip-3441-speed-up-cluster-gossip-processing-ban
Speed up cluster gossip processing #3441
2013-06-20 03:56:16 -07:00
Björn Antonsson
1adfcb8454 Speed up cluster gossip processing. See #3441
Check VectorClock for common case first and cache hashCodes. See #3441
Make ClusterDaemon a bit more testable. See #3441
Changing VectorClock and GossipOverview to TreeMaps. See #3441
Make VectorClock private[cluster] and remove unused code. See #3441
2013-06-20 11:36:24 +02:00
Patrik Nordwall
bc367aae96 Count vclock stats when published, not for each received gossip 2013-06-15 23:17:05 +02:00
Roland Kuhn
8df8541801 Merge pull request #1500 from akka/wip-3210-local-only-∂π
make LocalScope mean “purely local” and avoid Props serialization check,...
2013-05-30 08:03:32 -07:00
Roland
92db59183e make LocalScope mean “purely local” and avoid Props serialization check, see #3210 2013-05-29 23:36:39 +02:00
Patrik Nordwall
852be1b9bb Merge pull request #1489 from akka/wip-3192-fixme-patriknw
FIXME in cluster, see #3192
2013-05-28 07:17:16 -07:00
Patrik Nordwall
a323936299 Disable cluster stats by default, see #3348
* Add VectorClockStats
2013-05-28 16:15:57 +02:00
Patrik Nordwall
196a141976 FIXME in cluster, see #3192 2013-05-28 09:02:03 +02:00
Patrik Nordwall
28d1b1f187 Merge pull request #1480 from akka/wip-3388-HeartbeatReq-patriknw
Start heartbeatSender after Welcome, see #3388
2013-05-27 00:17:43 -07:00
Patrik Nordwall
ec1626b746 Start heartbeatSender after Welcome, see #3388
* Otherwise, if the Welcome message is lost, other nodes
  in the cluster will send HeartbeatRequest and it will start
  sending heartbeats without being a real member and the lost Welcome
  is not detected by the other members in the cluster
2013-05-24 15:38:28 +02:00
Patrik Nordwall
18a3b3facf Config of cluster info logging, see #3225 2013-05-23 13:36:35 +02:00
Patrik Nordwall
8f04b53ac7 Merge pull request #1443 from akka/wip-3359-auto-join-patriknw
Remove auto-join config, derive from seed-nodes, see #3359
2013-05-17 04:57:07 -07:00
Patrik Nordwall
ad1eaa6d4a Remove auto-join config, derive from seed-nodes, see #3359 2013-05-17 13:54:51 +02:00
Patrik Nordwall
a0a0f39613 Hardening of cluster member leaving path, see #3309
* Removed leader commands for Shutdown and Exit
* Member shutdown itself  when it sees itself as Exiting
* Singleton cluster with status Exiting will shutdown itself,
  in case the Exiting gossip never arrives
* Exiting member not part convergence check
* Exiting member is removed by leader (on convergence) when the
  exiting member is in the unreachable set, i.e. sucessfully shutdown
* Reverted the change made for #3266, i.e. Exiting is
  detected as unreachable again.
* Adjust ClusterSingletonManager to new Exiting behaviour
* Fix bug in HeartbeatSender, which caused it to continue to
  send heartbeats to removed nodes, instead of rebalancing
* Refactoring of leaderActions method
* Leaving section in docs
2013-05-17 11:39:49 +02:00
Patrik Nordwall
b8b65c9153 Cluster member age, and usage in singleton, see #3195
* Assign internal upNumber when member is moved to Up
* Public API Member.isOlder
* Change cluster singleton to use oldest member instead of leader
* Update samples and docs
2013-05-03 13:38:35 +02:00
Björn Antonsson
539df2e98a Enforce mailbox types on System actors. See #3273 2013-05-03 11:05:32 +02:00
Patrik Nordwall
6635ac4032 Reduce amount of gossip data transferred in idle cluster, see #3279
* When seen same the gossip chat is initated with GossipStatus
  message containing the vclock only
* Remove conversation flag in GossipEnvelope
* Ordinary tell instead of actorSelection when replying
2013-05-02 19:17:09 +02:00
Patrik Nordwall
293c97c71d Quick fix for unreachable exiting, see #3266 2013-05-02 19:17:08 +02:00
dario.rexin
3e8597d94b more deprecation warnings removed 2013-04-26 13:54:10 +02:00
Patrik Nordwall
9e56ab6fe5 Disallow re-joining, see #2873
* Disallow join requests when already part of a cluster
* Remove wipe state when joining, since join can only be
  performed from empty state
* When trying to join, only accept gossip from that member
* Ignore gossips from unknown (and unreachable) members
* Make sure received gossip contains selfAddress
* Test join of fresh node with same host:port
* Remove JoinTwoClustersSpec
* Welcome message as reply to Join
* Retry unsucessful join request
* AddressUidExtension
* Uid in cluster Member identifier
  To be able to distinguish nodes with same host:port
  after restart.
* Ignore gossip with wrong uid
* Renamed Remove command to Shutdown
* Use uid in vclock identifier
* Update sample, Member apply is private
* Disabled config duration syntax and cleanup of io settings
* Update documentation
2013-04-17 16:48:18 +02:00
Patrik Nordwall
3cfe8f28a2 Merge pull request #1324 from akka/wip-3209-remove-unreachable-patriknw
Don't send Remove command to unreachable, see #3209
2013-04-11 08:02:19 -07:00
Björn Antonsson
73f0f44ddb Protobuf serialization of cluster messages. See #1910 2013-04-11 10:09:05 +02:00
Patrik Nordwall
da621c502f Don't send Remove command to unreachable, see #3209 2013-04-09 21:06:48 +02:00
Patrik Nordwall
c77cdeb86b Merge pull request #1277 from akka/wip-3074-deprecate-actorFor-patriknw
Deprecate actorFor in favor of ActorSelection, see #3074
2013-04-08 11:48:48 -07:00
Patrik Nordwall
887af975ae Deprecate actorFor in favor of ActorSelection, see #3074
* Deprecate all actorFor methods
* resolveActorRef in provider
* Identify auto receive message
* Support ActorPath in actorSelection
* Support remote actor selections
* Additional tests of actor selection
* Update tests (keep most actorFor tests)
* Update samples to use actorSelection
* Updates to documentation
* Migration guide, including motivation
2013-04-08 18:11:52 +02:00
Patrik Nordwall
d2548285ac Cluster member status transition guards, see #2802 2013-04-08 09:29:00 +02:00
Björn Antonsson
5827a27b94 Make joining to the same node multiple times work, and reenable blackhole test. See #2930 2013-03-20 12:22:12 +01:00
Patrik Nordwall
7eac88f372 Cluster node roles, see #3049
* Config of node roles cluster.role
* Cluster router configurable with use-role
* RoleLeaderChanged event
* Cluster singleton per role
* Cluster only starts once all required per-role node
  counts are reached,
  role.<role-name>.min-nr-of-members config
*  Update documentation and make use of the roles in the examples
2013-03-18 11:56:11 +01:00
Viktor Klang (√)
05593f5dd8 Merge pull request #1230 from akka/wip-3076-gossip-merge-changes-ban
Don't increment vector-clock on merge and merge locally. See #3076
2013-03-12 08:49:30 -07:00
Patrik Nordwall
d98a7ef1e8 Cluster singleton failure due to down-removed, see #3130
* The scenario was that previous leader left.
* The problem was that the new leader got MemberRemoved
  before it got the HandOverDone and therefore missed the
  hand over data.
* Solved by not changing the singleton to leader when receiving
  MemberRemoved and instead do that on normal HandOverDone or
  in failure cases after retry timeout.
* The reason for this bug was the new transition from Down to
  Removed and that there is now no MemberDowned event. Previously
  this was only triggered by MemberDowned (not MemberRemoved) and
  that was safe because that was "always" preceeded by unreachable.
* The new solution means that it will take longer for new singleton
  to startup in case of unreachable previous leader, but I don't
  want to trigger it on MemberUnreachable because it might in the
  future be possible to switch it back to reachable.
2013-03-11 12:37:35 +01:00
Björn Antonsson
7ed6b3d4ee Fixes according review. See #3076 2013-03-11 12:27:29 +01:00
Patrik Nordwall
01bfb9378e Logging of joining 2013-03-08 15:47:03 +01:00
Björn Antonsson
386bf87f0e Don't increment vector-clock on merge and merge locally. See #3076 2013-03-08 12:14:25 +01:00
Patrik Nordwall
5c7747e7fa Transition from Down to Removed, see #3075 2013-03-07 14:02:42 +01:00
Björn Antonsson
78c3ca359a Fixes according to review. See #3115 2013-03-06 16:55:46 +01:00
Björn Antonsson
fad4289b1b Merge gossip seen table when versions are the same. See #3115 2013-03-05 12:49:35 +01:00
Patrik Nordwall
679c4d313d Support restart of first seed node, see #2854
* Try to first join other seed nodes before joining itself
2013-02-21 20:40:13 +01:00
Patrik Nordwall
b349ad8d87 Nodes not part of cluster have marked the Gossip as seen, see #3031
* Problem may occur when joining member with same hostname:port again,
  after downing.
* Reproduced with StressSpec exerciseJoinRemove with fixed port that
  joins and shutdown several times.
* Real solution for this will be covered by ticket #2788 by adding
  uid to member identifier, but as first step we need to support
  this scenario with current design.
* Use unique node identifier for vector clock to avoid mixup of
  old and new member instance.
* Support transition from Down to Joining in Gossip merge
* Don't gossip to unknown or unreachable members.
2013-02-12 21:55:08 +01:00
Patrik Nordwall
cab78e5174 Make cluster fault handling more robust, see #3030
* ClusterCoreDaemon and ClusterDomainEventPublisher can't be restarted
  because the state would be obsolete.
* Add extra supervisor level for ClusterCoreDaemon and
  ClusterDomainEventPublisher, which will shutdown the member
  on failure in children.
* Publish the final removed state on postStop in
  ClusterDomainEventPublisher. This also simplifies the removing
  process.
2013-02-12 21:55:08 +01:00
Patrik Nordwall
9dc124dacd Remove work-around for sending to broken connections, see #2909
* Previous work-around was introduced because Netty blocks when sending
to broken connections. This is supposed to be solved by the non-blocking
new remoting.
* Removed HeartbeatSender and CoreSender in cluster
* Added tests to verify that broken connections don't disturb live connection
2013-01-31 13:41:02 +01:00
Patrik Nordwall
5dc108567d Style change of def starting with if
* When a def starts with if and is not a oneliner the if
  should be on a new line.
* The reason is that it might be easy to miss the if when
  reading the code.
2013-01-18 13:28:49 +01:00