Commit graph

73 commits

Author SHA1 Message Date
Patrik Nordwall
8f04b53ac7 Merge pull request #1443 from akka/wip-3359-auto-join-patriknw
Remove auto-join config, derive from seed-nodes, see #3359
2013-05-17 04:57:07 -07:00
Patrik Nordwall
ad1eaa6d4a Remove auto-join config, derive from seed-nodes, see #3359 2013-05-17 13:54:51 +02:00
Patrik Nordwall
a0a0f39613 Hardening of cluster member leaving path, see #3309
* Removed leader commands for Shutdown and Exit
* Member shutdown itself  when it sees itself as Exiting
* Singleton cluster with status Exiting will shutdown itself,
  in case the Exiting gossip never arrives
* Exiting member not part convergence check
* Exiting member is removed by leader (on convergence) when the
  exiting member is in the unreachable set, i.e. sucessfully shutdown
* Reverted the change made for #3266, i.e. Exiting is
  detected as unreachable again.
* Adjust ClusterSingletonManager to new Exiting behaviour
* Fix bug in HeartbeatSender, which caused it to continue to
  send heartbeats to removed nodes, instead of rebalancing
* Refactoring of leaderActions method
* Leaving section in docs
2013-05-17 11:39:49 +02:00
Patrik Nordwall
b8b65c9153 Cluster member age, and usage in singleton, see #3195
* Assign internal upNumber when member is moved to Up
* Public API Member.isOlder
* Change cluster singleton to use oldest member instead of leader
* Update samples and docs
2013-05-03 13:38:35 +02:00
Björn Antonsson
539df2e98a Enforce mailbox types on System actors. See #3273 2013-05-03 11:05:32 +02:00
Patrik Nordwall
6635ac4032 Reduce amount of gossip data transferred in idle cluster, see #3279
* When seen same the gossip chat is initated with GossipStatus
  message containing the vclock only
* Remove conversation flag in GossipEnvelope
* Ordinary tell instead of actorSelection when replying
2013-05-02 19:17:09 +02:00
Patrik Nordwall
293c97c71d Quick fix for unreachable exiting, see #3266 2013-05-02 19:17:08 +02:00
dario.rexin
3e8597d94b more deprecation warnings removed 2013-04-26 13:54:10 +02:00
Patrik Nordwall
9e56ab6fe5 Disallow re-joining, see #2873
* Disallow join requests when already part of a cluster
* Remove wipe state when joining, since join can only be
  performed from empty state
* When trying to join, only accept gossip from that member
* Ignore gossips from unknown (and unreachable) members
* Make sure received gossip contains selfAddress
* Test join of fresh node with same host:port
* Remove JoinTwoClustersSpec
* Welcome message as reply to Join
* Retry unsucessful join request
* AddressUidExtension
* Uid in cluster Member identifier
  To be able to distinguish nodes with same host:port
  after restart.
* Ignore gossip with wrong uid
* Renamed Remove command to Shutdown
* Use uid in vclock identifier
* Update sample, Member apply is private
* Disabled config duration syntax and cleanup of io settings
* Update documentation
2013-04-17 16:48:18 +02:00
Patrik Nordwall
3cfe8f28a2 Merge pull request #1324 from akka/wip-3209-remove-unreachable-patriknw
Don't send Remove command to unreachable, see #3209
2013-04-11 08:02:19 -07:00
Björn Antonsson
73f0f44ddb Protobuf serialization of cluster messages. See #1910 2013-04-11 10:09:05 +02:00
Patrik Nordwall
da621c502f Don't send Remove command to unreachable, see #3209 2013-04-09 21:06:48 +02:00
Patrik Nordwall
c77cdeb86b Merge pull request #1277 from akka/wip-3074-deprecate-actorFor-patriknw
Deprecate actorFor in favor of ActorSelection, see #3074
2013-04-08 11:48:48 -07:00
Patrik Nordwall
887af975ae Deprecate actorFor in favor of ActorSelection, see #3074
* Deprecate all actorFor methods
* resolveActorRef in provider
* Identify auto receive message
* Support ActorPath in actorSelection
* Support remote actor selections
* Additional tests of actor selection
* Update tests (keep most actorFor tests)
* Update samples to use actorSelection
* Updates to documentation
* Migration guide, including motivation
2013-04-08 18:11:52 +02:00
Patrik Nordwall
d2548285ac Cluster member status transition guards, see #2802 2013-04-08 09:29:00 +02:00
Björn Antonsson
5827a27b94 Make joining to the same node multiple times work, and reenable blackhole test. See #2930 2013-03-20 12:22:12 +01:00
Patrik Nordwall
7eac88f372 Cluster node roles, see #3049
* Config of node roles cluster.role
* Cluster router configurable with use-role
* RoleLeaderChanged event
* Cluster singleton per role
* Cluster only starts once all required per-role node
  counts are reached,
  role.<role-name>.min-nr-of-members config
*  Update documentation and make use of the roles in the examples
2013-03-18 11:56:11 +01:00
Viktor Klang (√)
05593f5dd8 Merge pull request #1230 from akka/wip-3076-gossip-merge-changes-ban
Don't increment vector-clock on merge and merge locally. See #3076
2013-03-12 08:49:30 -07:00
Patrik Nordwall
d98a7ef1e8 Cluster singleton failure due to down-removed, see #3130
* The scenario was that previous leader left.
* The problem was that the new leader got MemberRemoved
  before it got the HandOverDone and therefore missed the
  hand over data.
* Solved by not changing the singleton to leader when receiving
  MemberRemoved and instead do that on normal HandOverDone or
  in failure cases after retry timeout.
* The reason for this bug was the new transition from Down to
  Removed and that there is now no MemberDowned event. Previously
  this was only triggered by MemberDowned (not MemberRemoved) and
  that was safe because that was "always" preceeded by unreachable.
* The new solution means that it will take longer for new singleton
  to startup in case of unreachable previous leader, but I don't
  want to trigger it on MemberUnreachable because it might in the
  future be possible to switch it back to reachable.
2013-03-11 12:37:35 +01:00
Björn Antonsson
7ed6b3d4ee Fixes according review. See #3076 2013-03-11 12:27:29 +01:00
Patrik Nordwall
01bfb9378e Logging of joining 2013-03-08 15:47:03 +01:00
Björn Antonsson
386bf87f0e Don't increment vector-clock on merge and merge locally. See #3076 2013-03-08 12:14:25 +01:00
Patrik Nordwall
5c7747e7fa Transition from Down to Removed, see #3075 2013-03-07 14:02:42 +01:00
Björn Antonsson
78c3ca359a Fixes according to review. See #3115 2013-03-06 16:55:46 +01:00
Björn Antonsson
fad4289b1b Merge gossip seen table when versions are the same. See #3115 2013-03-05 12:49:35 +01:00
Patrik Nordwall
679c4d313d Support restart of first seed node, see #2854
* Try to first join other seed nodes before joining itself
2013-02-21 20:40:13 +01:00
Patrik Nordwall
b349ad8d87 Nodes not part of cluster have marked the Gossip as seen, see #3031
* Problem may occur when joining member with same hostname:port again,
  after downing.
* Reproduced with StressSpec exerciseJoinRemove with fixed port that
  joins and shutdown several times.
* Real solution for this will be covered by ticket #2788 by adding
  uid to member identifier, but as first step we need to support
  this scenario with current design.
* Use unique node identifier for vector clock to avoid mixup of
  old and new member instance.
* Support transition from Down to Joining in Gossip merge
* Don't gossip to unknown or unreachable members.
2013-02-12 21:55:08 +01:00
Patrik Nordwall
cab78e5174 Make cluster fault handling more robust, see #3030
* ClusterCoreDaemon and ClusterDomainEventPublisher can't be restarted
  because the state would be obsolete.
* Add extra supervisor level for ClusterCoreDaemon and
  ClusterDomainEventPublisher, which will shutdown the member
  on failure in children.
* Publish the final removed state on postStop in
  ClusterDomainEventPublisher. This also simplifies the removing
  process.
2013-02-12 21:55:08 +01:00
Patrik Nordwall
9dc124dacd Remove work-around for sending to broken connections, see #2909
* Previous work-around was introduced because Netty blocks when sending
to broken connections. This is supposed to be solved by the non-blocking
new remoting.
* Removed HeartbeatSender and CoreSender in cluster
* Added tests to verify that broken connections don't disturb live connection
2013-01-31 13:41:02 +01:00
Patrik Nordwall
5dc108567d Style change of def starting with if
* When a def starts with if and is not a oneliner the if
  should be on a new line.
* The reason is that it might be easy to miss the if when
  reading the code.
2013-01-18 13:28:49 +01:00
Patrik Nordwall
8b4e903e7d Detect failure when no heartbeats sent, see #2907
* Subscribe to InstantMemberEvent and start heartbeating when
  InstantMemberUp. Same for metrics.
* HeartbeatNodeRing data structure for bidirectional mapping of
  heartbeat sender and receiver. Not using ConsistentHash anymore.
  Node addresses are hashed to ensure that neighbors are spread out.
* HeartbeatRequest when receiver detects that it has not received
  expected heartbeats.
* New test InitialHeartbeatSpec that simulates the problem
* Add/remove some related conf properties
* Add some more logging to be able to diagnose eventual problems
* Explicit config of nr-of-end-heartbeats
2013-01-18 12:54:09 +01:00
Viktor Klang (√)
6b638db65e Merge pull request #1006 from akka/wip-2879-copyright2013-√
#2879 - updating copyright info
2013-01-14 04:59:29 -08:00
Viktor Klang
adfeb2c1f0 #2879 - updating copyright info 2013-01-09 11:38:00 +01:00
Patrik Nordwall
943c438d5e Publish clean state when joining (PublishStart), see #2871
* The failure in JoinTwoClustersSpec was due to missing publishing
  of cluster events when clearing current state when joining
* This fix is in the right direction, but joining clusters like this
  will need some design thought, creating ticket 2873 for that
2013-01-08 19:32:36 +01:00
Björn Antonsson
a03460329d Change cluster MemberEvents to only be published on convergence. See #2692
Conflicts:
	akka-cluster/src/main/scala/akka/cluster/ClusterEvent.scala
	akka-cluster/src/main/scala/akka/cluster/ClusterJmx.scala
	akka-cluster/src/main/scala/akka/cluster/ClusterMetricsCollector.scala
	akka-cluster/src/main/scala/akka/cluster/ClusterReadView.scala
	akka-cluster/src/multi-jvm/scala/akka/cluster/MultiNodeClusterSpec.scala
	akka-docs/rst/cluster/cluster-usage-java.rst
	akka-docs/rst/cluster/cluster-usage-scala.rst
	akka-kernel/src/main/dist/bin/akka-cluster
2012-12-14 12:46:13 +01:00
Patrik Nordwall
44ab9f116f min-nr-of-members and registerOnMemberUp, see #2306
* Leader moves joining members to up when min-nr-of-members reached
* Tested by MinMembersBeforeUpSpec
* Used in factorial sample
* Docs
2012-12-12 14:00:06 +01:00
Patrik Nordwall
1df787d0c5 Incorporate review comments and cleanup isAvailable, see #2018
* Renamed isRunning to isTerminated (with negation of course)
* Removed Running from JMX API, since the mbean is deregistered anyway
* Cleanup isAvailable, isUnavailbe
* Misc minor
2012-12-06 15:26:57 +01:00
Patrik Nordwall
1914be7069 Merge branch 'master' into wip-2547-metrics-router-patriknw
Conflicts:
	akka-actor/src/main/scala/akka/actor/Deployer.scala
	akka-cluster/src/main/scala/akka/cluster/ClusterMetricsCollector.scala
	akka-cluster/src/test/scala/akka/cluster/MetricsCollectorSpec.scala
2012-11-15 12:33:11 +01:00
Roland
bff79c2f94 Merge remote-tracking branch 'origin/master' into wip-2.10.0-RC1-∂π
- currently cheating: uses zeroMQ artifacts for scala 2.10M7
- fixed a bunch of more wrong references to scala.concurrent.util
2012-10-15 16:18:52 +02:00
Roland
0f04239f67 move Duration classes according to scala 2.10 nightly and remove casts to FiniteDuration, see #2504 2012-10-11 15:18:10 -07:00
Patrik Nordwall
668d5a5013 Merge branch 'master' into wip-2284-heartbeat-scalability-patriknw
Conflicts:
	akka-cluster/src/main/scala/akka/cluster/ClusterDaemon.scala
2012-10-09 18:11:36 +02:00
Patrik Nordwall
1f3341713f Remove cluster.FixedRateTask, see #2606 2012-10-08 12:17:40 +02:00
Patrik Nordwall
3f73705abc Use consistent hash to heartbeat to a few nodes instead of all, see #2284
* Previously heartbeat messages was sent to all other members, i.e.
  each member was monitored by all other members in the cluster.
* This was the number one know scalability bottleneck, due to the
  number of interconnections.
* Limit sending of heartbeats to a few (5) members. Select and
  re-balance with consistent hashing algorithm when new members
  are added or removed.
* Send a few EndHeartbeat when ending send of Heartbeat messages.
2012-10-08 08:41:28 +02:00
Patrik Nordwall
cecde67226 Move heartbeat sending out from ClusterCoreDaemon, see #2284 2012-10-08 08:41:28 +02:00
Patrik Nordwall
49b9ec6c2c Publish cluster metrics through the publisher actor.
* To avoid ordering surprises metrics should be published via
  the same actor that handles the subscriptions and publishes
  other cluster domain events.
* Added missing publish in case of removal of member
  (had a test failure for that)
2012-10-02 17:08:38 +02:00
Patrik Nordwall
51ff9ce6d1 Cluster.unsubscribe with class parameter, see #2567 2012-09-28 13:09:36 +02:00
Helena Edelson
dbce1c8b85 Cluster metrics internal API and cluster-wide transport of metrics data.
* Create Cluster Metrics API
* Create transport of relevant metrics data
Does not include load-balancing routers.
2012-09-24 13:07:11 -06:00
Roland
35b7a9e338 second round of FiniteDuration business, including cluster fixes
- make Scheduler only accept FiniteDuration, which has quite some
  knock-on effects
2012-09-18 09:58:30 +02:00
Patrik Nordwall
50d0efe7d4 Request send/publish of CurrentClusterState, see #2438
* Added publishCurrentClusterState and sendCurrentClusterState
* Removed Ping/Pong that was used for some tests, since awaitCond is
  now needed anyway, since publish to eventStream is done afterwards
2012-09-12 09:23:02 +02:00
Patrik Nordwall
83e7f5d6d6 Incorparate review comments, see #2473
* Also added ClusterSettings in constructor of ClusterDaemon,
  because that will be needed to decide if the metrics actor is
  to be started
2012-09-07 17:42:15 +02:00