From b0626f0562d828f1b17afd5f4c47eee5227bd7ff Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 7 Feb 2012 14:20:51 +0100 Subject: [PATCH 01/72] Changes to cluster specification. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Added section on single-node cluster. - Changed seed nodes to deputy nodes. - Seed nodes are no longer used as contact points only to break logical partitions. Signed-off-by: Jonas Bonér --- akka-docs/cluster/cluster.rst | 114 +++++++++++++++++++++------------- 1 file changed, 72 insertions(+), 42 deletions(-) diff --git a/akka-docs/cluster/cluster.rst b/akka-docs/cluster/cluster.rst index af9f7e1d35..371cdf2615 100644 --- a/akka-docs/cluster/cluster.rst +++ b/akka-docs/cluster/cluster.rst @@ -57,16 +57,32 @@ These terms are used throughout the documentation. A mapping from partition path to a set of instance nodes (where the nodes are referred to by the ordinal position given the nodes in sorted order). +**leader** + A single node in the cluster that acts as the leader. Managing cluster convergence, + partitions, fail-over, rebalancing etc. + +**deputy nodes** + A set of nodes responsible for breaking logical partitions. + Membership ========== A cluster is made up of a set of member nodes. The identifier for each node is a -`hostname:port` pair. An Akka application is distributed over a cluster with +``hostname:port`` pair. An Akka application is distributed over a cluster with each node hosting some part of the application. Cluster membership and partitioning of the application are decoupled. A node could be a member of a cluster without hosting any actors. +Single-node Cluster +------------------- + +If a node does not have a preconfigured contact point to join in the Akka +configuration, then it is considered a single-node cluster and will +automatically transition from ``joining`` to ``up``. Single-node clusters +can later explicitly send a ``Join`` message to another node to form a N-node +cluster. It is also possible to link multiple N-node clusters by ``joining`` them. + Gossip ------ @@ -75,8 +91,8 @@ The cluster membership used in Akka is based on Amazon's `Dynamo`_ system and particularly the approach taken in Basho's' `Riak`_ distributed database. Cluster membership is communicated using a `Gossip Protocol`_, where the current state of the cluster is gossiped randomly through the cluster. Joining a cluster -is initiated by specifying a set of ``seed`` nodes with which to begin -gossiping. +is initiated by issuing a ``Join`` command to one of the nodes in the cluster to +join. .. _Gossip Protocol: http://en.wikipedia.org/wiki/Gossip_protocol .. _Dynamo: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf @@ -102,7 +118,7 @@ the `pruning algorithm`_ in Riak. .. _pruning algorithm: http://wiki.basho.com/Vector-Clocks.html#Vector-Clock-Pruning -Gossip convergence +Gossip Convergence ^^^^^^^^^^^^^^^^^^ Information about the cluster converges at certain points of time. This is when @@ -146,31 +162,45 @@ order to account for network issues that sometimes occur on such platforms. Leader ^^^^^^ -After gossip convergence a leader for the cluster can be determined. There is no -leader election process, the leader can always be recognised deterministically -by any node whenever there is gossip convergence. The leader is simply the first +After gossip convergence a ``leader`` for the cluster can be determined. There is no +``leader`` election process, the ``leader`` can always be recognised deterministically +by any node whenever there is gossip convergence. The ``leader`` is simply the first node in sorted order that is able to take the leadership role, where the only -allowed member states for a leader are ``up`` or ``leaving`` (see below for more +allowed member states for a ``leader`` are ``up`` or ``leaving`` (see below for more information about member states). -The role of the leader is to shift members in and out of the cluster, changing +The role of the ``leader`` is to shift members in and out of the cluster, changing ``joining`` members to the ``up`` state or ``exiting`` members to the ``removed`` state, and to schedule rebalancing across the cluster. Currently -leader actions are only triggered by receiving a new cluster state with gossip +``leader`` actions are only triggered by receiving a new cluster state with gossip convergence but it may also be possible for the user to explicitly rebalance the cluster by specifying migrations, or to rebalance the cluster automatically based on metrics from member nodes. Metrics may be spread using the gossip protocol or possibly more efficiently using a *random chord* method, where the -leader contacts several random nodes around the cluster ring and each contacted +``leader`` contacts several random nodes around the cluster ring and each contacted node gathers information from their immediate neighbours, giving a random sampling of load information. -The leader also has the power, if configured so, to "auto-down" a node that +The ``leader`` also has the power, if configured so, to "auto-down" a node that according to the Failure Detector is considered unreachable. This means setting the unreachable node status to ``down`` automatically. -Gossip protocol +Deputy Nodes +^^^^^^^^^^^^ + +After gossip convergence a set of ``deputy`` nodes for the cluster can be +determined. As with the ``leader``, there is no ``deputy`` election process, +the deputies can always be recognised deterministically by any node whenever there +is gossip convergence. The list of ``deputy`` nodes is simply the N - 1 number +of nodes (e.g. starting with the first node after the ``leader``) in sorted order. + +The nodes defined as ``deputy`` nodes are just regular member nodes whose only +"special role" is to help breaking logical partitions as seen in the gossip +algorithm defined below. + + +Gossip Protocol ^^^^^^^^^^^^^^^ A variation of *push-pull gossip* is used to reduce the amount of gossip @@ -186,14 +216,14 @@ nodes involved in a gossip exchange. Periodically, the default is every 1 second, each node chooses another random node to initiate a round of gossip with. The choice of node is random but can -also include extra gossiping for unreachable nodes, seed nodes, and nodes with +also include extra gossiping for unreachable nodes, ``deputy`` nodes, and nodes with either newer or older state versions. The gossip overview contains the current state version for all nodes and also a list of unreachable nodes. Whenever a node receives a gossip overview it updates the `Failure Detector`_ with the liveness information. -The nodes defined as ``seed`` nodes are just regular member nodes whose only +The nodes defined as ``deputy`` nodes are just regular member nodes whose only "special role" is to function as contact points in the cluster and to help breaking logical partitions as seen in the gossip algorithm defined below. @@ -204,9 +234,9 @@ During each round of gossip exchange the following process is used: 2. Gossip to random unreachable node with certain probability depending on the number of unreachable and live nodes -3. If the node gossiped to at (1) was not a ``seed`` node, or the number of live - nodes is less than number of seeds, gossip to random ``seed`` node with - certain probability depending on number of unreachable, seed, and live nodes. +3. If the node gossiped to at (1) was not a ``deputy`` node, or the number of live + nodes is less than number of ``deputy`` nodes, gossip to random ``deputy`` node with + certain probability depending on number of unreachable, ``deputy``, and live nodes. 4. Gossip to random node with newer or older state information, based on the current gossip overview, with some probability (?) @@ -260,18 +290,18 @@ Some of the other structures used are:: PartitionChangeStatus = Awaiting | Complete -Membership lifecycle +Membership Lifecycle -------------------- A node begins in the ``joining`` state. Once all nodes have seen that the new -node is joining (through gossip convergence) the leader will set the member +node is joining (through gossip convergence) the ``leader`` will set the member state to ``up`` and can start assigning partitions to the new node. If a node is leaving the cluster in a safe, expected manner then it switches to -the ``leaving`` state. The leader will reassign partitions across the cluster -(it is possible for a leaving node to itself be the leader). When all partition +the ``leaving`` state. The ``leader`` will reassign partitions across the cluster +(it is possible for a leaving node to itself be the ``leader``). When all partition handoff has completed then the node will change to the ``exiting`` state. Once -all nodes have seen the exiting state (convergence) the leader will remove the +all nodes have seen the exiting state (convergence) the ``leader`` will remove the node from the cluster, marking it as ``removed``. A node can also be removed forcefully by moving it directly to the ``removed`` @@ -279,7 +309,7 @@ state using the ``remove`` action. The cluster will rebalance based on the new cluster membership. If a node is unreachable then gossip convergence is not possible and therefore -any leader actions are also not possible (for instance, allowing a node to +any ``leader`` actions are also not possible (for instance, allowing a node to become a part of the cluster, or changing actor distribution). To be able to move forward the state of the unreachable nodes must be changed. If the unreachable node is experiencing only transient difficulties then it can be @@ -293,13 +323,13 @@ This means that nodes can join and leave the cluster at any point in time, e.g. provide cluster elasticity. -State diagram for the member states +State Diagram for the Member States ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. image:: images/member-states.png -Member states +Member States ^^^^^^^^^^^^^ - **joining** @@ -318,12 +348,12 @@ Member states marked as down/offline/unreachable -User actions +User Actions ^^^^^^^^^^^^ - **join** join a single node to a cluster - can be explicit or automatic on - startup if a list of seed nodes have been specified in the configuration + startup if a node to join have been specified in the configuration - **leave** tell a node to leave the cluster gracefully @@ -335,10 +365,10 @@ User actions remove a node from the cluster immediately -Leader actions +Leader Actions ^^^^^^^^^^^^^^ -The leader has the following duties: +The ``leader`` has the following duties: - shifting members in and out of the cluster @@ -364,7 +394,7 @@ set of nodes in the cluster. The actor at the head of the partition is referred to as the partition point. The mapping from partition path (actor address of the format "a/b/c") to instance nodes is stored in the partition table and is maintained as part of the cluster state through the gossip protocol. The -partition table is only updated by the leader node. Currently the only possible +partition table is only updated by the ``leader`` node. Currently the only possible partition points are *routed* actors. Routed actors can have an instance count greater than one. The instance count is @@ -375,7 +405,7 @@ Note that in the first implementation there may be a restriction such that only top-level partitions are possible (the highest possible partition points are used and sub-partitioning is not allowed). Still to be explored in more detail. -The cluster leader determines the current instance count for a partition based +The cluster ``leader`` determines the current instance count for a partition based on two axes: fault-tolerance and scaling. Fault-tolerance determines a minimum number of instances for a routed actor @@ -415,8 +445,8 @@ the following, with all instances on the same physical nodes as before:: B -> { 7, 9, 10 } C -> { 12, 14, 15, 1, 2 } -When rebalancing is required the leader will schedule handoffs, gossiping a set -of pending changes, and when each change is complete the leader will update the +When rebalancing is required the ``leader`` will schedule handoffs, gossiping a set +of pending changes, and when each change is complete the ``leader`` will update the partition table. @@ -436,7 +466,7 @@ the handoff), given a previous host node ``N1``, a new host node ``N2``, and an actor partition ``A`` to be migrated from ``N1`` to ``N2``, has this general structure: - 1. the leader sets a pending change for ``N1`` to handoff ``A`` to ``N2`` + 1. the ``leader`` sets a pending change for ``N1`` to handoff ``A`` to ``N2`` 2. ``N1`` notices the pending change and sends an initialization message to ``N2`` @@ -445,7 +475,7 @@ structure: 4. after receiving the ready message ``N1`` marks the change as complete and shuts down ``A`` - 5. the leader sees the migration is complete and updates the partition table + 5. the ``leader`` sees the migration is complete and updates the partition table 6. all nodes eventually see the new partitioning and use ``N2`` @@ -457,7 +487,7 @@ There are transition times in the handoff process where different approaches can be used to give different guarantees. -Migration transition +Migration Transition ~~~~~~~~~~~~~~~~~~~~ The first transition starts when ``N1`` initiates the moving of ``A`` and ends @@ -480,7 +510,7 @@ buffered until the actor is ready, or the messages are simply dropped by terminating the actor and allowing the normal dead letter process to be used. -Update transition +Update Transition ~~~~~~~~~~~~~~~~~ The second transition begins when the migration is marked as complete and ends @@ -514,12 +544,12 @@ messages sent directly to ``N2`` before the acknowledgement has been forwarded that will be buffered. -Graceful handoff +Graceful Handoff ^^^^^^^^^^^^^^^^ A more complete process for graceful handoff would be: - 1. the leader sets a pending change for ``N1`` to handoff ``A`` to ``N2`` + 1. the ``leader`` sets a pending change for ``N1`` to handoff ``A`` to ``N2`` 2. ``N1`` notices the pending change and sends an initialization message to @@ -550,7 +580,7 @@ A more complete process for graceful handoff would be: becoming dead letters) - 5. the leader sees the migration is complete and updates the partition table + 5. the ``leader`` sees the migration is complete and updates the partition table 6. all nodes eventually see the new partitioning and use ``N2`` @@ -594,7 +624,7 @@ distributed datastore. See the next section for a rough outline on how the distributed datastore could be implemented. -Implementing a Dynamo-style distributed database on top of Akka Cluster +Implementing a Dynamo-style Distributed Database on top of Akka Cluster ----------------------------------------------------------------------- The missing pieces to implement a full Dynamo-style eventually consistent data From 20f74bd2843c23af49ada25ea5ecf23cbcdd0f76 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 7 Feb 2012 16:53:49 +0100 Subject: [PATCH 02/72] Removed cluster seed nodes, added 'join.contact-point', changed joining phase, added singleton cluster mode plus misc other changes. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/resources/reference.conf | 20 +- .../scala/akka/cluster/ClusterSettings.scala | 17 +- .../main/scala/akka/cluster/Gossiper.scala | 191 +++++++++--------- .../main/scala/akka/cluster/VectorClock.scala | 12 +- .../akka/cluster/ClusterConfigSpec.scala | 10 +- akka-docs/cluster/cluster.rst | 9 +- 6 files changed, 137 insertions(+), 122 deletions(-) diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index e097d34f3e..53bf7a41eb 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -8,9 +8,18 @@ akka { cluster { - seed-nodes = [] - seed-node-connection-timeout = 30s - max-time-to-retry-joining-cluster = 30s + join { + # contact point on the form of "hostname:port" of a node to try to join + # leave as empty string if the node should be a singleton cluster + contact-point = "" + timeout = 30s + max-time-to-retry = 30s + } + + gossip { + initialDelay = 5s + frequency = 1s + } # accrual failure detection config failure-detector { @@ -24,10 +33,5 @@ akka { max-sample-size = 1000 } - - gossip { - initial-delay = 5s - frequency = 1s - } } } diff --git a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala index e88c3ae72c..0a0697223b 100644 --- a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala +++ b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala @@ -16,11 +16,16 @@ class ClusterSettings(val config: Config, val systemName: String) { // cluster config section val FailureDetectorThreshold = getInt("akka.cluster.failure-detector.threshold") val FailureDetectorMaxSampleSize = getInt("akka.cluster.failure-detector.max-sample-size") - val SeedNodeConnectionTimeout = Duration(config.getMilliseconds("akka.cluster.seed-node-connection-timeout"), MILLISECONDS) - val MaxTimeToRetryJoiningCluster = Duration(config.getMilliseconds("akka.cluster.max-time-to-retry-joining-cluster"), MILLISECONDS) - val InitialDelayForGossip = Duration(getMilliseconds("akka.cluster.gossip.initial-delay"), MILLISECONDS) - val GossipFrequency = Duration(getMilliseconds("akka.cluster.gossip.frequency"), MILLISECONDS) - val SeedNodes = Set.empty[Address] ++ getStringList("akka.cluster.seed-nodes").asScala.collect { - case AddressExtractor(addr) ⇒ addr + + // join config + val JoinContactPoint: Option[Address] = getString("akka.cluster.join.contact-point") match { + case "" ⇒ None + case AddressExtractor(addr) ⇒ Some(addr) } + val JoinTimeout = Duration(config.getMilliseconds("akka.cluster.join.timeout"), MILLISECONDS) + val JoinMaxTimeToRetry = Duration(config.getMilliseconds("akka.cluster.join.max-time-to-retry"), MILLISECONDS) + + // gossip config + val GossipInitialDelay = Duration(getMilliseconds("akka.cluster.gossip.initialDelay"), MILLISECONDS) + val GossipFrequency = Duration(getMilliseconds("akka.cluster.gossip.frequency"), MILLISECONDS) } diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index bb15223842..47536ff5d2 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -32,6 +32,8 @@ trait NodeMembershipChangeListener { def memberDisconnected(member: Member) } +// FIXME create Protobuf messages out of all the Gossip stuff - but wait until the prototol is fully stablized. + /** * Base trait for all cluster messages. All ClusterMessage's are serializable. */ @@ -40,14 +42,13 @@ sealed trait ClusterMessage extends Serializable /** * Command to join the cluster. */ -case object JoinCluster extends ClusterMessage +case class Join(node: Address) extends ClusterMessage /** * Represents the state of the cluster; cluster ring membership, ring convergence, meta data - all versioned by a vector clock. */ case class Gossip( - version: VectorClock = VectorClock(), - member: Address, + member: Member, // sorted set of members with their status, sorted by name members: SortedSet[Member] = SortedSet.empty[Member](Ordering.fromLessThan[Member](_.address.toString > _.address.toString)), unavailableMembers: Set[Member] = Set.empty[Member], @@ -55,7 +56,9 @@ case class Gossip( seen: Map[Member, VectorClock] = Map.empty[Member, VectorClock], // for handoff //pendingChanges: Option[Vector[PendingPartitioningChange]] = None, - meta: Option[Map[String, Array[Byte]]] = None) + meta: Option[Map[String, Array[Byte]]] = None, + // vector clock version + version: VectorClock = VectorClock()) extends ClusterMessage // is a serializable cluster message with Versioned // has a vector clock as version @@ -69,13 +72,13 @@ case class Member(address: Address, status: MemberStatus) extends ClusterMessage * * Can be one of: Joining, Up, Leaving, Exiting and Down. */ -sealed trait MemberStatus extends ClusterMessage with Versioned +sealed trait MemberStatus extends ClusterMessage object MemberStatus { - case class Joining(version: VectorClock = VectorClock()) extends MemberStatus - case class Up(version: VectorClock = VectorClock()) extends MemberStatus - case class Leaving(version: VectorClock = VectorClock()) extends MemberStatus - case class Exiting(version: VectorClock = VectorClock()) extends MemberStatus - case class Down(version: VectorClock = VectorClock()) extends MemberStatus + case object Joining extends MemberStatus + case object Up extends MemberStatus + case object Leaving extends MemberStatus + case object Exiting extends MemberStatus + case object Down extends MemberStatus } // sealed trait PendingPartitioningStatus @@ -94,11 +97,9 @@ final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor val log = Logging(system, "ClusterDaemon") def receive = { - case JoinCluster ⇒ sender ! gossiper.latestGossip - case gossip: Gossip ⇒ - gossiper.tell(gossip) - - case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") + case Join(address) ⇒ sender ! gossiper.latestGossip // TODO use address in Join(address) ? + case gossip: Gossip ⇒ gossiper.tell(gossip) + case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } @@ -113,8 +114,8 @@ final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor *
  *   1) Gossip to random live member (if any)
  *   2) Gossip to random unreachable member with certain probability depending on number of unreachable and live members
- *   3) If the member gossiped to at (1) was not seed, or the number of live members is less than number of seeds,
- *       gossip to random seed with certain probability depending on number of unreachable, seed and live members.
+ *   3) If the member gossiped to at (1) was not deputy, or the number of live members is less than number of deputy list,
+ *       gossip to random deputy with certain probability depending on number of unreachable, deputy and live members.
  * 
*/ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { @@ -132,22 +133,20 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { val protocol = "akka" // TODO should this be hardcoded? val address = remote.transport.address - val memberFingerprint = address.## - val initialDelayForGossip = clusterSettings.InitialDelayForGossip + + val gossipInitialDelay = clusterSettings.GossipInitialDelay val gossipFrequency = clusterSettings.GossipFrequency - implicit val seedNodeConnectionTimeout = clusterSettings.SeedNodeConnectionTimeout + + implicit val joinTimeout = clusterSettings.JoinTimeout implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) - // seed members - private val seeds: Set[Member] = { - if (clusterSettings.SeedNodes.isEmpty) throw new ConfigurationException( - "At least one seed member must be defined in the configuration [akka.cluster.seed-members]") - else clusterSettings.SeedNodes map (address ⇒ Member(address, MemberStatus.Up())) - } + private val contactPoint: Option[Member] = + clusterSettings.JoinContactPoint filter (_ != address) map (address ⇒ Member(address, MemberStatus.Up)) private val serialization = remote.serialization - private val failureDetector = new AccrualFailureDetector(system, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) + private val failureDetector = new AccrualFailureDetector( + system, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) private val isRunning = new AtomicBoolean(true) private val log = Logging(system, "Gossiper") @@ -162,12 +161,12 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { log.info("Starting cluster Gossiper...") - // join the cluster by connecting to one of the seed members and retrieve current cluster state (Gossip) - joinCluster(clusterSettings.MaxTimeToRetryJoiningCluster fromNow) + // join the cluster by connecting to one of the deputy members and retrieve current cluster state (Gossip) + joinContactPoint(clusterSettings.JoinMaxTimeToRetry fromNow) // start periodic gossip and cluster scrutinization - val initateGossipCanceller = system.scheduler.schedule(initialDelayForGossip, gossipFrequency)(initateGossip()) - val scrutinizeCanceller = system.scheduler.schedule(initialDelayForGossip, gossipFrequency)(scrutinize()) + val initateGossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency)(initateGossip()) + val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency)(scrutinize()) /** * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. @@ -196,7 +195,7 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { final def tell(newGossip: Gossip) { val gossipingNode = newGossip.member - failureDetector heartbeat gossipingNode // update heartbeat in failure detector + failureDetector heartbeat gossipingNode.address // update heartbeat in failure detector // FIXME all below here is WRONG - redesign with cluster convergence in mind @@ -224,7 +223,7 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { // println("---------- WON RACE - setting state") // // create connections for all new members in the latest gossip // (latestAvailableNodes + gossipingNode) foreach { member ⇒ - // setUpConnectionToNode(member) + // setUpConnectionTo(member) // oldState.memberMembershipChangeListeners foreach (_ memberConnected member) // notify listeners about the new members // } // } @@ -267,69 +266,43 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { } /** - * Sets up remote connections to all the members in the argument list. + * Joins the pre-configured contact point and retrieves current gossip state. */ - private def connectToNodes(members: Seq[Member]) { - members foreach { member ⇒ - setUpConnectionToNode(member) - state.get.memberMembershipChangeListeners foreach (_ memberConnected member) // notify listeners about the new members - } - } - - // FIXME should shuffle list randomly before start traversing to avoid connecting to some member on every member - @tailrec - final private def connectToRandomNodeOf(members: Seq[Member]): ActorRef = { - members match { - case member :: rest ⇒ - setUpConnectionToNode(member) match { - case Some(connection) ⇒ connection - case None ⇒ connectToRandomNodeOf(rest) // recur if - } - case Nil ⇒ - throw new RemoteConnectionException( - "Could not establish connection to any of the members in the argument list") - } - } - - /** - * Joins the cluster by connecting to one of the seed members and retrieve current cluster state (Gossip). - */ - private def joinCluster(deadline: Deadline) { - val seedNodes = seedNodesWithoutMyself // filter out myself - - if (!seedNodes.isEmpty) { // if we have seed members to contact - connectToNodes(seedNodes) - + private def joinContactPoint(deadline: Deadline) { + def tryJoinContactPoint(connection: ActorRef, deadline: Deadline) { try { - log.info("Trying to join cluster through one of the seed members [{}]", seedNodes.mkString(", ")) - - Await.result(connectToRandomNodeOf(seedNodes) ? JoinCluster, seedNodeConnectionTimeout) match { + Await.result(connection ? Join(address), joinTimeout) match { case initialGossip: Gossip ⇒ // just sets/overwrites the state/gossip regardless of what it was before // since it should be treated as the initial state state.set(state.get copy (currentGossip = initialGossip)) - log.debug("Received initial gossip [{}] from seed member", initialGossip) + log.debug("Received initial gossip [{}]", initialGossip) case unknown ⇒ - throw new IllegalStateException("Expected initial gossip from seed, received [" + unknown + "]") + throw new IllegalStateException("Expected initial gossip but received [" + unknown + "]") } } catch { case e: Exception ⇒ - log.error( - "Could not join cluster through any of the seed members - retrying for another {} seconds", - deadline.timeLeft.toSeconds) + log.error("Could not join contact point node - retrying for another {} seconds", deadline.timeLeft.toSeconds) // retry joining the cluster unless // 1. Gossiper is shut down // 2. The connection time window has expired - if (isRunning.get) { - if (deadline.timeLeft.toMillis > 0) joinCluster(deadline) // recur - else throw new RemoteConnectionException( - "Could not join cluster (any of the seed members) - giving up after trying for " + - deadline.time.toSeconds + " seconds") - } + if (isRunning.get && deadline.timeLeft.toMillis > 0) tryJoinContactPoint(connection, deadline) // recur + else throw new RemoteConnectionException( + "Could not join contact point node - giving up after trying for " + deadline.time.toSeconds + " seconds") } } + + contactPoint match { + case None ⇒ log.info("Booting up in singleton cluster mode") + case Some(member) ⇒ + log.info("Trying to join contact point node defined in the configuration [{}]", member) + setUpConnectionTo(member) match { + case None ⇒ log.error("Could not set up connection to join contact point node defined in the configuration [{}]", member) + case Some(connection) ⇒ tryJoinContactPoint(connection, deadline) + } + } } /** @@ -346,7 +319,7 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { val oldUnavailableMembersSize = oldUnavailableMembers.size // 1. gossip to alive members - val gossipedToSeed = + val shouldGossipToDeputy = if (oldUnavailableMembersSize > 0) gossipToRandomNodeOf(oldMembers) else false @@ -356,12 +329,13 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnavailableMembers) } - // 3. gossip to a seed for facilitating partition healing - if ((!gossipedToSeed || oldMembersSize < 1) && (seeds.head != address)) { - if (oldMembersSize == 0) gossipToRandomNodeOf(seeds) + // 3. gossip to a deputy nodes for facilitating partition healing + val deputies = deputyNodesWithoutMyself + if ((!shouldGossipToDeputy || oldMembersSize < 1) && (deputies.head != address)) { + if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) else { val probability = 1.0 / oldMembersSize + oldUnavailableMembersSize - if (random.nextDouble() <= probability) gossipToRandomNodeOf(seeds) + if (random.nextDouble() <= probability) gossipToRandomNodeOf(deputies) } } } @@ -369,18 +343,25 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { /** * Gossips to a random member in the set of members passed in as argument. * - * @return 'true' if it gossiped to a "seed" member. + * @return 'true' if it gossiped to a "deputy" member. */ - private def gossipToRandomNodeOf(members: Set[Member]): Boolean = { + private def gossipToRandomNodeOf(members: Seq[Member]): Boolean = { val peers = members filter (_.address != address) // filter out myself val peer = selectRandomNode(peers) val oldState = state.get val oldGossip = oldState.currentGossip // if connection can't be established/found => ignore it since the failure detector will take care of the potential problem - setUpConnectionToNode(peer) foreach { _ ! newGossip } - seeds exists (peer == _) + setUpConnectionTo(peer) foreach { _ ! newGossip } + deputyNodesWithoutMyself exists (peer == _) } + /** + * Gossips to a random member in the set of members passed in as argument. + * + * @return 'true' if it gossiped to a "deputy" member. + */ + private def gossipToRandomNodeOf(members: Set[Member]): Boolean = gossipToRandomNodeOf(members.toList) + /** * Scrutinizes the cluster; marks members detected by the failure detector as unavailable, and notifies all listeners * of the change in the cluster membership. @@ -413,7 +394,30 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { } } - private def setUpConnectionToNode(member: Member): Option[ActorRef] = { + // FIXME should shuffle list randomly before start traversing to avoid connecting to some member on every member + @tailrec + final private def connectToRandomNodeOf(members: Seq[Member]): ActorRef = { + members match { + case member :: rest ⇒ + setUpConnectionTo(member) match { + case Some(connection) ⇒ connection + case None ⇒ connectToRandomNodeOf(rest) // recur if + } + case Nil ⇒ + throw new RemoteConnectionException( + "Could not establish connection to any of the members in the argument list") + } + } + + /** + * Sets up remote connections to all the members in the argument list. + */ + private def setUpConnectionsTo(members: Seq[Member]): Seq[Option[ActorRef]] = members map { setUpConnectionTo(_) } + + /** + * Sets up remote connection. + */ + private def setUpConnectionTo(member: Member): Option[ActorRef] = { val address = member.address try { Some( @@ -425,14 +429,13 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { } } - private def newGossip(): Gossip = Gossip(member = address) + private def newGossip(): Gossip = Gossip(Member(address, MemberStatus.Joining)) // starts in Joining mode private def incrementVersionForGossip(from: Gossip): Gossip = { - val newVersion = from.version.increment(memberFingerprint, newTimestamp) - from copy (version = newVersion) + from copy (version = from.version.increment(memberFingerprint, newTimestamp)) } - private def seedNodesWithoutMyself: List[Member] = seeds.filter(_.address != address).toList + private def deputyNodesWithoutMyself: Seq[Member] = Seq.empty[Member] filter (_.address != address) // FIXME read in deputy nodes from gossip data - now empty seq - private def selectRandomNode(members: Set[Member]): Member = members.toList(random.nextInt(members.size)) + private def selectRandomNode(members: Seq[Member]): Member = members(random.nextInt(members.size)) } diff --git a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala index ef1f1be490..d8d87db75b 100644 --- a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala +++ b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala @@ -30,11 +30,11 @@ object Versioned { /** * Representation of a Vector-based clock (counting clock), inspired by Lamport logical clocks. - * {{ + * {{{ * Reference: * 1) Leslie Lamport (1978). "Time, clocks, and the ordering of events in a distributed system". Communications of the ACM 21 (7): 558-565. * 2) Friedemann Mattern (1988). "Virtual Time and Global States of Distributed Systems". Workshop on Parallel and Distributed Algorithms: pp. 215-226 - * }} + * }}} */ case class VectorClock( versions: Vector[VectorClock.Entry] = Vector.empty[VectorClock.Entry], @@ -76,11 +76,11 @@ object VectorClock { /** * The result of comparing two vector clocks. * Either: - * {{ + * {{{ * 1) v1 is BEFORE v2 * 2) v1 is AFTER t2 * 3) v1 happens CONCURRENTLY to v2 - * }} + * }}} */ sealed trait Ordering case object Before extends Ordering @@ -97,11 +97,11 @@ object VectorClock { /** * Compare two vector clocks. The outcomes will be one of the following: *

- * {{ + * {{{ * 1. Clock 1 is BEFORE clock 2 if there exists an i such that c1(i) <= c(2) and there does not exist a j such that c1(j) > c2(j). * 2. Clock 1 is CONCURRENT to clock 2 if there exists an i, j such that c1(i) < c2(i) and c1(j) > c2(j). * 3. Clock 1 is AFTER clock 2 otherwise. - * }} + * }}} * * @param v1 The first VectorClock * @param v2 The second VectorClock diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 240d1ad3ff..7f1b26e553 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -25,11 +25,13 @@ class ClusterConfigSpec extends AkkaSpec( import settings._ FailureDetectorThreshold must be(8) FailureDetectorMaxSampleSize must be(1000) - SeedNodeConnectionTimeout must be(30 seconds) - MaxTimeToRetryJoiningCluster must be(30 seconds) - InitialDelayForGossip must be(5 seconds) + + JoinContactPoint must be(None) + JoinTimeout must be(30 seconds) + JoinMaxTimeToRetry must be(30 seconds) + + GossipInitialDelay must be(5 seconds) GossipFrequency must be(1 second) - SeedNodes must be(Set()) } } } diff --git a/akka-docs/cluster/cluster.rst b/akka-docs/cluster/cluster.rst index 371cdf2615..c145456552 100644 --- a/akka-docs/cluster/cluster.rst +++ b/akka-docs/cluster/cluster.rst @@ -74,12 +74,13 @@ each node hosting some part of the application. Cluster membership and partitioning of the application are decoupled. A node could be a member of a cluster without hosting any actors. -Single-node Cluster -------------------- + +Singleton Cluster +----------------- If a node does not have a preconfigured contact point to join in the Akka -configuration, then it is considered a single-node cluster and will -automatically transition from ``joining`` to ``up``. Single-node clusters +configuration, then it is considered a singleton cluster (single node cluster) +and will automatically transition from ``joining`` to ``up``. Singleton clusters can later explicitly send a ``Join`` message to another node to form a N-node cluster. It is also possible to link multiple N-node clusters by ``joining`` them. From 5c22c30738d4bf5374147f63c42631de7a176be1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 8 Feb 2012 14:14:01 +0100 Subject: [PATCH 03/72] Completed singleton and N-node cluster boot up and joining phase. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Simplified node join phase. * Added tests for cluster node startup and joining, both for singleton cluster and 2-node cluster. * Fixed bug in cluster node address and cluster daemon lookup. * Changed some APIs. * Renamed 'contact-point' to 'node-to-join'. * Minor refactorings. Signed-off-by: Jonas Bonér --- .../src/main/resources/reference.conf | 10 +- .../scala/akka/cluster/ClusterSettings.scala | 9 +- .../main/scala/akka/cluster/Gossiper.scala | 201 ++++++++++-------- .../akka/cluster/ClusterConfigSpec.scala | 6 +- .../scala/akka/cluster/NodeStartupSpec.scala | 90 ++++++++ .../scala/akka/remote/RemoteAddress.scala | 5 - 6 files changed, 202 insertions(+), 119 deletions(-) create mode 100644 akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala delete mode 100644 akka-remote/src/main/scala/akka/remote/RemoteAddress.scala diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index 53bf7a41eb..4d8c4a5e32 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -8,13 +8,9 @@ akka { cluster { - join { - # contact point on the form of "hostname:port" of a node to try to join - # leave as empty string if the node should be a singleton cluster - contact-point = "" - timeout = 30s - max-time-to-retry = 30s - } + # node to join - the full URI defined by a string on the form of "akka://system@hostname:port" + # leave as empty string if the node should be a singleton cluster + node-to-join = "" gossip { initialDelay = 5s diff --git a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala index 0a0697223b..be3205148b 100644 --- a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala +++ b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala @@ -13,19 +13,12 @@ import akka.actor.AddressExtractor class ClusterSettings(val config: Config, val systemName: String) { import config._ - // cluster config section val FailureDetectorThreshold = getInt("akka.cluster.failure-detector.threshold") val FailureDetectorMaxSampleSize = getInt("akka.cluster.failure-detector.max-sample-size") - - // join config - val JoinContactPoint: Option[Address] = getString("akka.cluster.join.contact-point") match { + val NodeToJoin: Option[Address] = getString("akka.cluster.node-to-join") match { case "" ⇒ None case AddressExtractor(addr) ⇒ Some(addr) } - val JoinTimeout = Duration(config.getMilliseconds("akka.cluster.join.timeout"), MILLISECONDS) - val JoinMaxTimeToRetry = Duration(config.getMilliseconds("akka.cluster.join.max-time-to-retry"), MILLISECONDS) - - // gossip config val GossipInitialDelay = Duration(getMilliseconds("akka.cluster.gossip.initialDelay"), MILLISECONDS) val GossipFrequency = Duration(getMilliseconds("akka.cluster.gossip.frequency"), MILLISECONDS) } diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 47536ff5d2..b134a9c54c 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -48,9 +48,9 @@ case class Join(node: Address) extends ClusterMessage * Represents the state of the cluster; cluster ring membership, ring convergence, meta data - all versioned by a vector clock. */ case class Gossip( - member: Member, + self: Member, // sorted set of members with their status, sorted by name - members: SortedSet[Member] = SortedSet.empty[Member](Ordering.fromLessThan[Member](_.address.toString > _.address.toString)), + members: SortedSet[Member], unavailableMembers: Set[Member] = Set.empty[Member], // for ring convergence seen: Map[Member, VectorClock] = Map.empty[Member, VectorClock], @@ -97,8 +97,8 @@ final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor val log = Logging(system, "ClusterDaemon") def receive = { - case Join(address) ⇒ sender ! gossiper.latestGossip // TODO use address in Join(address) ? - case gossip: Gossip ⇒ gossiper.tell(gossip) + case Join(address) ⇒ gossiper.joining(address) + case gossip: Gossip ⇒ gossiper.receive(gossip) case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } @@ -118,31 +118,30 @@ final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor * gossip to random deputy with certain probability depending on number of unreachable, deputy and live members. * */ -case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { +case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Represents the state for this Gossiper. Implemented using optimistic lockless concurrency, * all state is represented by this immutable case class and managed by an AtomicReference. */ private case class State( - currentGossip: Gossip, + latestGossip: Gossip, + isSingletonCluster: Boolean = true, // starts as singleton cluster memberMembershipChangeListeners: Set[NodeMembershipChangeListener] = Set.empty[NodeMembershipChangeListener]) val remoteSettings = new RemoteSettings(system.settings.config, system.name) val clusterSettings = new ClusterSettings(system.settings.config, system.name) - val protocol = "akka" // TODO should this be hardcoded? - val address = remote.transport.address - val memberFingerprint = address.## + val remoteAddress = remote.transport.address + val memberFingerprint = remoteAddress.## val gossipInitialDelay = clusterSettings.GossipInitialDelay val gossipFrequency = clusterSettings.GossipFrequency - implicit val joinTimeout = clusterSettings.JoinTimeout implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) - private val contactPoint: Option[Member] = - clusterSettings.JoinContactPoint filter (_ != address) map (address ⇒ Member(address, MemberStatus.Up)) + private val nodeToJoin: Option[Member] = + clusterSettings.NodeToJoin filter (_ != remoteAddress) map (address ⇒ Member(address, MemberStatus.Joining)) private val serialization = remote.serialization private val failureDetector = new AccrualFailureDetector( @@ -154,31 +153,42 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { // Is it right to put this guy under the /system path or should we have a top-level /cluster or something else...? private val clusterDaemon = system.systemActorOf(Props(new ClusterDaemon(system, this)), "cluster") - private val state = new AtomicReference[State](State(currentGossip = newGossip())) + + private val state = { + val member = Member(remoteAddress, MemberStatus.Joining) + val gossip = Gossip( + self = member, + members = SortedSet.empty[Member](Ordering.fromLessThan[Member](_.address.toString > _.address.toString)) + member) // add joining node as Joining + new AtomicReference[State](State(gossip)) + } // FIXME manage connections in some other way so we can delete the RemoteConnectionManager (SINCE IT SUCKS!!!) private val connectionManager = new RemoteConnectionManager(system, remote, failureDetector, Map.empty[Address, ActorRef]) - log.info("Starting cluster Gossiper...") + log.info("Node [{}] - Starting cluster Gossiper...", remoteAddress) - // join the cluster by connecting to one of the deputy members and retrieve current cluster state (Gossip) - joinContactPoint(clusterSettings.JoinMaxTimeToRetry fromNow) + // try to join the node defined in the 'akka.cluster.node-to-join' option + join() // start periodic gossip and cluster scrutinization - val initateGossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency)(initateGossip()) - val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency)(scrutinize()) + val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { + gossip() + } + val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { + scrutinize() + } /** * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. */ def shutdown() { if (isRunning.compareAndSet(true, false)) { - log.info("Shutting down Gossiper for [{}]...", address) + log.info("Node [{}] - Shutting down Gossiper", remoteAddress) try connectionManager.shutdown() finally { try system.stop(clusterDaemon) finally { - try initateGossipCanceller.cancel() finally { + try gossipCanceller.cancel() finally { try scrutinizeCanceller.cancel() finally { - log.info("Gossiper for [{}] is shut down", address) + log.info("Node [{}] - Gossiper is shut down", remoteAddress) } } } @@ -186,60 +196,90 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { } } - def latestGossip: Gossip = state.get.currentGossip + /** + * Latest gossip. + */ + def latestGossip: Gossip = state.get.latestGossip /** - * Tell the gossiper some gossip. + * Member status for this node. + */ + def self: Member = latestGossip.self + + /** + * Is this node a singleton cluster? + */ + def isSingletonCluster: Boolean = state.get.isSingletonCluster + + /** + * New node joining. + */ + @tailrec + final def joining(node: Address) { + log.debug("Node [{}] - Node [{}] is joining", remoteAddress, node) + val oldState = state.get + val oldGossip = oldState.latestGossip + val oldMembers = oldGossip.members + val newGossip = oldGossip copy (members = oldMembers + Member(node, MemberStatus.Joining)) // add joining node as Joining + val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + if (!state.compareAndSet(oldState, newState)) joining(node) // recur if we failed update + } + + /** + * Receive new gossip. */ //@tailrec - final def tell(newGossip: Gossip) { - val gossipingNode = newGossip.member + final def receive(newGossip: Gossip) { + val from = newGossip.self + log.debug("Node [{}] - Receiving gossip from [{}]", remoteAddress, from.address) - failureDetector heartbeat gossipingNode.address // update heartbeat in failure detector + failureDetector heartbeat from.address // update heartbeat in failure detector + + // FIXME set flag state.isSingletonCluster = false (if true) // FIXME all below here is WRONG - redesign with cluster convergence in mind // val oldState = state.get // println("-------- NEW VERSION " + newGossip) - // println("-------- OLD VERSION " + oldState.currentGossip) - // val latestGossip = VectorClock.latestVersionOf(newGossip, oldState.currentGossip) - // println("-------- WINNING VERSION " + latestGossip) + // println("-------- OLD VERSION " + oldState.latestGossip) + // val gossip = VectorClock.latestVersionOf(newGossip, oldState.latestGossip) + // println("-------- WINNING VERSION " + gossip) - // val latestAvailableNodes = latestGossip.members - // val latestUnavailableNodes = latestGossip.unavailableMembers - // println("=======>>> gossipingNode: " + gossipingNode) + // val latestAvailableNodes = gossip.members + // val latestUnavailableNodes = gossip.unavailableMembers + // println("=======>>> myself: " + myself) // println("=======>>> latestAvailableNodes: " + latestAvailableNodes) - // if (!(latestAvailableNodes contains gossipingNode) && !(latestUnavailableNodes contains gossipingNode)) { + // if (!(latestAvailableNodes contains myself) && !(latestUnavailableNodes contains myself)) { // println("-------- NEW NODE") // // we have a new member - // val newGossip = latestGossip copy (availableNodes = latestAvailableNodes + gossipingNode) - // val newState = oldState copy (currentGossip = incrementVersionForGossip(newGossip)) + // val newGossip = gossip copy (availableNodes = latestAvailableNodes + myself) + // val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) // println("--------- new GOSSIP " + newGossip.members) // println("--------- new STATE " + newState) // // if we won the race then update else try again - // if (!state.compareAndSet(oldState, newState)) tell(newGossip) // recur + // if (!state.compareAndSet(oldState, newState)) receive(newGossip) // recur // else { // println("---------- WON RACE - setting state") // // create connections for all new members in the latest gossip - // (latestAvailableNodes + gossipingNode) foreach { member ⇒ + // (latestAvailableNodes + myself) foreach { member ⇒ // setUpConnectionTo(member) // oldState.memberMembershipChangeListeners foreach (_ memberConnected member) // notify listeners about the new members // } // } - // } else if (latestUnavailableNodes contains gossipingNode) { + // } else if (latestUnavailableNodes contains myself) { // // gossip from an old former dead member - // val newUnavailableMembers = latestUnavailableNodes - gossipingNode - // val newMembers = latestAvailableNodes + gossipingNode + // val newUnavailableMembers = latestUnavailableNodes - myself + // val newMembers = latestAvailableNodes + myself - // val newGossip = latestGossip copy (availableNodes = newMembers, unavailableNodes = newUnavailableMembers) - // val newState = oldState copy (currentGossip = incrementVersionForGossip(newGossip)) + // val newGossip = gossip copy (availableNodes = newMembers, unavailableNodes = newUnavailableMembers) + // val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) // // if we won the race then update else try again - // if (!state.compareAndSet(oldState, newState)) tell(newGossip) // recur - // else oldState.memberMembershipChangeListeners foreach (_ memberConnected gossipingNode) // notify listeners on successful update of state + // if (!state.compareAndSet(oldState, newState)) receive(newGossip) // recur + // else oldState.memberMembershipChangeListeners foreach (_ memberConnected myself) // notify listeners on successful update of state // } } @@ -268,49 +308,20 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { /** * Joins the pre-configured contact point and retrieves current gossip state. */ - private def joinContactPoint(deadline: Deadline) { - def tryJoinContactPoint(connection: ActorRef, deadline: Deadline) { - try { - Await.result(connection ? Join(address), joinTimeout) match { - case initialGossip: Gossip ⇒ - // just sets/overwrites the state/gossip regardless of what it was before - // since it should be treated as the initial state - state.set(state.get copy (currentGossip = initialGossip)) - log.debug("Received initial gossip [{}]", initialGossip) - - case unknown ⇒ - throw new IllegalStateException("Expected initial gossip but received [" + unknown + "]") - } - } catch { - case e: Exception ⇒ - log.error("Could not join contact point node - retrying for another {} seconds", deadline.timeLeft.toSeconds) - - // retry joining the cluster unless - // 1. Gossiper is shut down - // 2. The connection time window has expired - if (isRunning.get && deadline.timeLeft.toMillis > 0) tryJoinContactPoint(connection, deadline) // recur - else throw new RemoteConnectionException( - "Could not join contact point node - giving up after trying for " + deadline.time.toSeconds + " seconds") - } - } - - contactPoint match { - case None ⇒ log.info("Booting up in singleton cluster mode") - case Some(member) ⇒ - log.info("Trying to join contact point node defined in the configuration [{}]", member) - setUpConnectionTo(member) match { - case None ⇒ log.error("Could not set up connection to join contact point node defined in the configuration [{}]", member) - case Some(connection) ⇒ tryJoinContactPoint(connection, deadline) - } + private def join() = nodeToJoin foreach { member ⇒ + setUpConnectionTo(member) foreach { connection ⇒ + val command = Join(remoteAddress) + log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, member.address, connection) + connection ! command } } /** * Initates a new round of gossip. */ - private def initateGossip() { + private def gossip() { val oldState = state.get - val oldGossip = oldState.currentGossip + val oldGossip = oldState.latestGossip val oldMembers = oldGossip.members val oldMembersSize = oldMembers.size @@ -331,7 +342,7 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { // 3. gossip to a deputy nodes for facilitating partition healing val deputies = deputyNodesWithoutMyself - if ((!shouldGossipToDeputy || oldMembersSize < 1) && (deputies.head != address)) { + if ((!shouldGossipToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) else { val probability = 1.0 / oldMembersSize + oldUnavailableMembersSize @@ -341,17 +352,24 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { } /** - * Gossips to a random member in the set of members passed in as argument. + * Gossips latest gossip to a member. + */ + private def gossipTo(member: Member) { + setUpConnectionTo(member) foreach { _ ! latestGossip } + } + + /** + * Gossips latest gossip to a random member in the set of members passed in as argument. * * @return 'true' if it gossiped to a "deputy" member. */ private def gossipToRandomNodeOf(members: Seq[Member]): Boolean = { - val peers = members filter (_.address != address) // filter out myself + val peers = members filter (_.address != remoteAddress) // filter out myself val peer = selectRandomNode(peers) val oldState = state.get - val oldGossip = oldState.currentGossip + val oldGossip = oldState.latestGossip // if connection can't be established/found => ignore it since the failure detector will take care of the potential problem - setUpConnectionTo(peer) foreach { _ ! newGossip } + gossipTo(peer) deputyNodesWithoutMyself exists (peer == _) } @@ -369,7 +387,7 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { @tailrec final private def scrutinize() { val oldState = state.get - val oldGossip = oldState.currentGossip + val oldGossip = oldState.latestGossip val oldMembers = oldGossip.members val oldUnavailableMembers = oldGossip.unavailableMembers @@ -380,7 +398,7 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { val newUnavailableMembers = oldUnavailableMembers ++ newlyDetectedUnavailableMembers val newGossip = oldGossip copy (members = newMembers, unavailableMembers = newUnavailableMembers) - val newState = oldState copy (currentGossip = incrementVersionForGossip(newGossip)) + val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) // if we won the race then update else try again if (!state.compareAndSet(oldState, newState)) scrutinize() // recur @@ -420,22 +438,17 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { private def setUpConnectionTo(member: Member): Option[ActorRef] = { val address = member.address try { - Some( - connectionManager.putIfAbsent( - address, - () ⇒ system.actorFor(RootActorPath(Address(protocol, system.name)) / "system" / "cluster"))) + Some(connectionManager.putIfAbsent(address, () ⇒ system.actorFor(RootActorPath(address) / "system" / "cluster"))) } catch { case e: Exception ⇒ None } } - private def newGossip(): Gossip = Gossip(Member(address, MemberStatus.Joining)) // starts in Joining mode - private def incrementVersionForGossip(from: Gossip): Gossip = { from copy (version = from.version.increment(memberFingerprint, newTimestamp)) } - private def deputyNodesWithoutMyself: Seq[Member] = Seq.empty[Member] filter (_.address != address) // FIXME read in deputy nodes from gossip data - now empty seq + private def deputyNodesWithoutMyself: Seq[Member] = Seq.empty[Member] filter (_.address != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq private def selectRandomNode(members: Seq[Member]): Member = members(random.nextInt(members.size)) } diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 7f1b26e553..78c836f0b5 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -25,11 +25,7 @@ class ClusterConfigSpec extends AkkaSpec( import settings._ FailureDetectorThreshold must be(8) FailureDetectorMaxSampleSize must be(1000) - - JoinContactPoint must be(None) - JoinTimeout must be(30 seconds) - JoinMaxTimeToRetry must be(30 seconds) - + NodeToJoin must be(None) GossipInitialDelay must be(5 seconds) GossipFrequency must be(1 second) } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala new file mode 100644 index 0000000000..4f07650f62 --- /dev/null +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -0,0 +1,90 @@ +/** + * Copyright (C) 2009-2011 Typesafe Inc. + */ +package akka.cluster + +import java.net.InetSocketAddress + +import akka.testkit._ +import akka.dispatch._ +import akka.actor._ +import akka.remote._ + +import com.typesafe.config._ + +class NodeStartupSpec extends AkkaSpec(""" + akka { + loglevel = "DEBUG" + } + """) with ImplicitSender { + + var gossiper0: Gossiper = _ + var gossiper1: Gossiper = _ + var node0: ActorSystemImpl = _ + var node1: ActorSystemImpl = _ + + try { + node0 = ActorSystem("NodeStartupSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] + gossiper0 = Gossiper(node0, remote0) + + "A first cluster node with a 'node-to-join' config set to empty string" must { + "be in 'Joining' phase when started up" in { + val members = gossiper0.latestGossip.members + val joiningMember = members find (_.address.port.get == 5550) + joiningMember must be('defined) + joiningMember.get.status must be(MemberStatus.Joining) + } + + "be a singleton cluster when started up" in { + gossiper0.isSingletonCluster must be(true) + } + } + + node1 = ActorSystem("NodeStartupSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5551 + } + cluster.node-to-join = "akka://NodeStartupSpec@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] + gossiper1 = Gossiper(node1, remote1) + + "A second cluster node with a 'node-to-join' config defined" must { + "join the other node cluster as 'Joining' when sending a Join command" in { + Thread.sleep(1000) // give enough time for node1 to JOIN node0 + val members = gossiper0.latestGossip.members + val joiningMember = members find (_.address.port.get == 5551) + joiningMember must be('defined) + joiningMember.get.status must be(MemberStatus.Joining) + } + } + } catch { + case e: Exception ⇒ + e.printStackTrace + fail(e.toString) + } + + override def atTermination() { + gossiper0.shutdown() + node0.shutdown() + gossiper1.shutdown() + node1.shutdown() + } +} diff --git a/akka-remote/src/main/scala/akka/remote/RemoteAddress.scala b/akka-remote/src/main/scala/akka/remote/RemoteAddress.scala deleted file mode 100644 index f7274c2356..0000000000 --- a/akka-remote/src/main/scala/akka/remote/RemoteAddress.scala +++ /dev/null @@ -1,5 +0,0 @@ -/** - * Copyright (C) 2009-2012 Typesafe Inc. - */ -package akka.remote - From 089f50da0d1dd3b3ccde9557730112490df72a76 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 7 Feb 2012 14:20:51 +0100 Subject: [PATCH 04/72] Changes to cluster specification. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Added section on single-node cluster. - Changed seed nodes to deputy nodes. - Seed nodes are no longer used as contact points only to break logical partitions. Signed-off-by: Jonas Bonér --- akka-docs/cluster/cluster.rst | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/akka-docs/cluster/cluster.rst b/akka-docs/cluster/cluster.rst index c145456552..40c84f3737 100644 --- a/akka-docs/cluster/cluster.rst +++ b/akka-docs/cluster/cluster.rst @@ -74,6 +74,15 @@ each node hosting some part of the application. Cluster membership and partitioning of the application are decoupled. A node could be a member of a cluster without hosting any actors. +Single-node Cluster +------------------- + +If a node does not have a preconfigured contact point to join in the Akka +configuration, then it is considered a single-node cluster and will +automatically transition from ``joining`` to ``up``. Single-node clusters +can later explicitly send a ``Join`` message to another node to form a N-node +cluster. It is also possible to link multiple N-node clusters by ``joining`` them. + Singleton Cluster ----------------- From 755408a52800e7fc8cd72b3e36179e5bfafd8554 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 7 Feb 2012 16:53:49 +0100 Subject: [PATCH 05/72] Removed cluster seed nodes, added 'join.contact-point', changed joining phase, added singleton cluster mode plus misc other changes. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- akka-cluster/src/main/resources/reference.conf | 10 ++++++++++ .../src/main/scala/akka/cluster/Gossiper.scala | 10 ++++++++++ .../test/scala/akka/cluster/ClusterConfigSpec.scala | 6 +++++- akka-docs/cluster/cluster.rst | 9 +++++---- 4 files changed, 30 insertions(+), 5 deletions(-) diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index 4d8c4a5e32..41a53ce52f 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -8,9 +8,19 @@ akka { cluster { +<<<<<<< HEAD # node to join - the full URI defined by a string on the form of "akka://system@hostname:port" # leave as empty string if the node should be a singleton cluster node-to-join = "" +======= + join { + # contact point on the form of "hostname:port" of a node to try to join + # leave as empty string if the node should be a singleton cluster + contact-point = "" + timeout = 30s + max-time-to-retry = 30s + } +>>>>>>> Removed cluster seed nodes, added 'join.contact-point', changed joining phase, added singleton cluster mode plus misc other changes. gossip { initialDelay = 5s diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index b134a9c54c..a082f29d7c 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -314,6 +314,16 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, member.address, connection) connection ! command } + + contactPoint match { + case None ⇒ log.info("Booting up in singleton cluster mode") + case Some(member) ⇒ + log.info("Trying to join contact point node defined in the configuration [{}]", member) + setUpConnectionTo(member) match { + case None ⇒ log.error("Could not set up connection to join contact point node defined in the configuration [{}]", member) + case Some(connection) ⇒ tryJoinContactPoint(connection, deadline) + } + } } /** diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 78c836f0b5..7f1b26e553 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -25,7 +25,11 @@ class ClusterConfigSpec extends AkkaSpec( import settings._ FailureDetectorThreshold must be(8) FailureDetectorMaxSampleSize must be(1000) - NodeToJoin must be(None) + + JoinContactPoint must be(None) + JoinTimeout must be(30 seconds) + JoinMaxTimeToRetry must be(30 seconds) + GossipInitialDelay must be(5 seconds) GossipFrequency must be(1 second) } diff --git a/akka-docs/cluster/cluster.rst b/akka-docs/cluster/cluster.rst index 40c84f3737..64e49b898e 100644 --- a/akka-docs/cluster/cluster.rst +++ b/akka-docs/cluster/cluster.rst @@ -74,12 +74,13 @@ each node hosting some part of the application. Cluster membership and partitioning of the application are decoupled. A node could be a member of a cluster without hosting any actors. -Single-node Cluster -------------------- + +Singleton Cluster +----------------- If a node does not have a preconfigured contact point to join in the Akka -configuration, then it is considered a single-node cluster and will -automatically transition from ``joining`` to ``up``. Single-node clusters +configuration, then it is considered a singleton cluster (single node cluster) +and will automatically transition from ``joining`` to ``up``. Singleton clusters can later explicitly send a ``Join`` message to another node to form a N-node cluster. It is also possible to link multiple N-node clusters by ``joining`` them. From 24d5b4615f02b0543c3f4a20f7572b40782c58c4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 8 Feb 2012 14:14:01 +0100 Subject: [PATCH 06/72] Completed singleton and N-node cluster boot up and joining phase. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Simplified node join phase. * Added tests for cluster node startup and joining, both for singleton cluster and 2-node cluster. * Fixed bug in cluster node address and cluster daemon lookup. * Changed some APIs. * Renamed 'contact-point' to 'node-to-join'. * Minor refactorings. Signed-off-by: Jonas Bonér --- akka-cluster/src/main/resources/reference.conf | 10 ---------- .../test/scala/akka/cluster/ClusterConfigSpec.scala | 6 +----- 2 files changed, 1 insertion(+), 15 deletions(-) diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index 41a53ce52f..4d8c4a5e32 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -8,19 +8,9 @@ akka { cluster { -<<<<<<< HEAD # node to join - the full URI defined by a string on the form of "akka://system@hostname:port" # leave as empty string if the node should be a singleton cluster node-to-join = "" -======= - join { - # contact point on the form of "hostname:port" of a node to try to join - # leave as empty string if the node should be a singleton cluster - contact-point = "" - timeout = 30s - max-time-to-retry = 30s - } ->>>>>>> Removed cluster seed nodes, added 'join.contact-point', changed joining phase, added singleton cluster mode plus misc other changes. gossip { initialDelay = 5s diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 7f1b26e553..78c836f0b5 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -25,11 +25,7 @@ class ClusterConfigSpec extends AkkaSpec( import settings._ FailureDetectorThreshold must be(8) FailureDetectorMaxSampleSize must be(1000) - - JoinContactPoint must be(None) - JoinTimeout must be(30 seconds) - JoinMaxTimeToRetry must be(30 seconds) - + NodeToJoin must be(None) GossipInitialDelay must be(5 seconds) GossipFrequency must be(1 second) } From 204bbc7b6408f59f9bf350b7e92a77912eec1952 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 8 Feb 2012 15:11:06 +0100 Subject: [PATCH 07/72] Switching node status to Up if singleton cluster. Added 'switchStatusTo' method. Updated the test. Profit. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/Gossiper.scala | 38 ++++++++++++++++--- .../scala/akka/cluster/NodeStartupSpec.scala | 14 +++---- 2 files changed, 39 insertions(+), 13 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index a082f29d7c..d848347736 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -156,9 +156,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val state = { val member = Member(remoteAddress, MemberStatus.Joining) - val gossip = Gossip( - self = member, - members = SortedSet.empty[Member](Ordering.fromLessThan[Member](_.address.toString > _.address.toString)) + member) // add joining node as Joining + val gossip = Gossip(self = member, members = SortedSet.empty[Member](memberOrdering) + member) new AtomicReference[State](State(gossip)) } @@ -168,7 +166,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { log.info("Node [{}] - Starting cluster Gossiper...", remoteAddress) // try to join the node defined in the 'akka.cluster.node-to-join' option - join() + nodeToJoin match { + case None ⇒ switchStatusTo(MemberStatus.Up) // if we are singleton cluster then we are already considered to be UP + case Some(member) ⇒ join(member) + } // start periodic gossip and cluster scrutinization val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { @@ -308,10 +309,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Joins the pre-configured contact point and retrieves current gossip state. */ - private def join() = nodeToJoin foreach { member ⇒ + private def join(member: Member) { setUpConnectionTo(member) foreach { connection ⇒ val command = Join(remoteAddress) - log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, member.address, connection) + log.info("Node [{}] - Sending [{}] to [{}]", remoteAddress, command, member.address) connection ! command } @@ -361,6 +362,29 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } } + @tailrec + final private def switchStatusTo(newStatus: MemberStatus) { + log.info("Node [{}] - Switching membership status to [{}]", remoteAddress, newStatus) + val oldState = state.get + val oldGossip = oldState.latestGossip + + val oldSelf = oldGossip.self + val oldMembers = oldGossip.members + + val newSelf = oldSelf copy (status = newStatus) + + val newMembersSet = oldMembers map { member ⇒ + if (member.address == remoteAddress) newSelf + else member + } + // ugly crap to work around bug in scala colletions ('val ss: SortedSet[Member] = SortedSet.empty[Member] ++ aSet' does not compile) + val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*)(memberOrdering) + + val newGossip = oldGossip copy (self = newSelf, members = newMembersSortedSet) + val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + if (!state.compareAndSet(oldState, newState)) switchStatusTo(newStatus) // recur if we failed update + } + /** * Gossips latest gossip to a member. */ @@ -461,4 +485,6 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private def deputyNodesWithoutMyself: Seq[Member] = Seq.empty[Member] filter (_.address != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq private def selectRandomNode(members: Seq[Member]): Member = members(random.nextInt(members.size)) + + private def memberOrdering = Ordering.fromLessThan[Member](_.address.toString > _.address.toString) } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index 4f07650f62..32bf0bc6b5 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -38,16 +38,16 @@ class NodeStartupSpec extends AkkaSpec(""" val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] gossiper0 = Gossiper(node0, remote0) - "A first cluster node with a 'node-to-join' config set to empty string" must { - "be in 'Joining' phase when started up" in { + "A first cluster node with a 'node-to-join' config set to empty string (singleton cluster)" must { + "be a singleton cluster when started up" in { + gossiper0.isSingletonCluster must be(true) + } + + "be in 'Up' phase when started up" in { val members = gossiper0.latestGossip.members val joiningMember = members find (_.address.port.get == 5550) joiningMember must be('defined) - joiningMember.get.status must be(MemberStatus.Joining) - } - - "be a singleton cluster when started up" in { - gossiper0.isSingletonCluster must be(true) + joiningMember.get.status must be(MemberStatus.Up) } } From f3df5422872169082b30eefb85ae0a8e754859db Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 8 Feb 2012 16:15:31 +0100 Subject: [PATCH 08/72] Skips gossipping and cluster scrutinization if singleton cluster. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/Gossiper.scala | 93 ++++++++++--------- 1 file changed, 49 insertions(+), 44 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index d848347736..a74debcffe 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -138,6 +138,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val gossipInitialDelay = clusterSettings.GossipInitialDelay val gossipFrequency = clusterSettings.GossipFrequency + implicit val memberOrdering = Ordering.fromLessThan[Member](_.address.toString > _.address.toString) + implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) private val nodeToJoin: Option[Member] = @@ -156,7 +158,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val state = { val member = Member(remoteAddress, MemberStatus.Joining) - val gossip = Gossip(self = member, members = SortedSet.empty[Member](memberOrdering) + member) + val gossip = Gossip(self = member, members = SortedSet.empty[Member] + member) new AtomicReference[State](State(gossip)) } @@ -223,6 +225,9 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val oldMembers = oldGossip.members val newGossip = oldGossip copy (members = oldMembers + Member(node, MemberStatus.Joining)) // add joining node as Joining val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + + // FIXME set flag state.isSingletonCluster = false (if true) + if (!state.compareAndSet(oldState, newState)) joining(node) // recur if we failed update } @@ -332,32 +337,33 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ private def gossip() { val oldState = state.get - val oldGossip = oldState.latestGossip + if (!oldState.isSingletonCluster) { // do not gossip if we are a singleton cluster + val oldGossip = oldState.latestGossip + val oldMembers = oldGossip.members + val oldMembersSize = oldMembers.size - val oldMembers = oldGossip.members - val oldMembersSize = oldMembers.size + val oldUnavailableMembers = oldGossip.unavailableMembers + val oldUnavailableMembersSize = oldUnavailableMembers.size - val oldUnavailableMembers = oldGossip.unavailableMembers - val oldUnavailableMembersSize = oldUnavailableMembers.size + // 1. gossip to alive members + val shouldGossipToDeputy = + if (oldUnavailableMembersSize > 0) gossipToRandomNodeOf(oldMembers) + else false - // 1. gossip to alive members - val shouldGossipToDeputy = - if (oldUnavailableMembersSize > 0) gossipToRandomNodeOf(oldMembers) - else false + // 2. gossip to dead members + if (oldUnavailableMembersSize > 0) { + val probability: Double = oldUnavailableMembersSize / (oldMembersSize + 1) + if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnavailableMembers) + } - // 2. gossip to dead members - if (oldUnavailableMembersSize > 0) { - val probability: Double = oldUnavailableMembersSize / (oldMembersSize + 1) - if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnavailableMembers) - } - - // 3. gossip to a deputy nodes for facilitating partition healing - val deputies = deputyNodesWithoutMyself - if ((!shouldGossipToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { - if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) - else { - val probability = 1.0 / oldMembersSize + oldUnavailableMembersSize - if (random.nextDouble() <= probability) gossipToRandomNodeOf(deputies) + // 3. gossip to a deputy nodes for facilitating partition healing + val deputies = deputyNodesWithoutMyself + if ((!shouldGossipToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { + if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) + else { + val probability = 1.0 / oldMembersSize + oldUnavailableMembersSize + if (random.nextDouble() <= probability) gossipToRandomNodeOf(deputies) + } } } } @@ -378,7 +384,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { else member } // ugly crap to work around bug in scala colletions ('val ss: SortedSet[Member] = SortedSet.empty[Member] ++ aSet' does not compile) - val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*)(memberOrdering) + val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*) val newGossip = oldGossip copy (self = newSelf, members = newMembersSortedSet) val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) @@ -421,27 +427,28 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { @tailrec final private def scrutinize() { val oldState = state.get - val oldGossip = oldState.latestGossip + if (!oldState.isSingletonCluster) { // do not scrutinize if we are a singleton cluster + val oldGossip = oldState.latestGossip + val oldMembers = oldGossip.members + val oldUnavailableMembers = oldGossip.unavailableMembers + val newlyDetectedUnavailableMembers = oldMembers filterNot (member ⇒ failureDetector.isAvailable(member.address)) - val oldMembers = oldGossip.members - val oldUnavailableMembers = oldGossip.unavailableMembers - val newlyDetectedUnavailableMembers = oldMembers filterNot (member ⇒ failureDetector.isAvailable(member.address)) + if (!newlyDetectedUnavailableMembers.isEmpty) { // we have newly detected members marked as unavailable + val newMembers = oldMembers diff newlyDetectedUnavailableMembers + val newUnavailableMembers = oldUnavailableMembers ++ newlyDetectedUnavailableMembers - if (!newlyDetectedUnavailableMembers.isEmpty) { // we have newly detected members marked as unavailable - val newMembers = oldMembers diff newlyDetectedUnavailableMembers - val newUnavailableMembers = oldUnavailableMembers ++ newlyDetectedUnavailableMembers + val newGossip = oldGossip copy (members = newMembers, unavailableMembers = newUnavailableMembers) + val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) - val newGossip = oldGossip copy (members = newMembers, unavailableMembers = newUnavailableMembers) - val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) - - // if we won the race then update else try again - if (!state.compareAndSet(oldState, newState)) scrutinize() // recur - else { - // notify listeners on successful update of state - for { - deadNode ← newUnavailableMembers - listener ← oldState.memberMembershipChangeListeners - } listener memberDisconnected deadNode + // if we won the race then update else try again + if (!state.compareAndSet(oldState, newState)) scrutinize() // recur + else { + // notify listeners on successful update of state + for { + deadNode ← newUnavailableMembers + listener ← oldState.memberMembershipChangeListeners + } listener memberDisconnected deadNode + } } } } @@ -485,6 +492,4 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private def deputyNodesWithoutMyself: Seq[Member] = Seq.empty[Member] filter (_.address != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq private def selectRandomNode(members: Seq[Member]): Member = members(random.nextInt(members.size)) - - private def memberOrdering = Ordering.fromLessThan[Member](_.address.toString > _.address.toString) } From b1f3107dd7162b3bf89b16b83d5117a2883c8624 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Thu, 9 Feb 2012 13:36:39 +0100 Subject: [PATCH 09/72] Refactored Gossip state and management. Introduced GossipOverview with convergence info, renamed some fields, added some new cluster commands. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/Gossiper.scala | 130 +++++++++++------- 1 file changed, 78 insertions(+), 52 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index a74debcffe..249546b0ad 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -45,22 +45,19 @@ sealed trait ClusterMessage extends Serializable case class Join(node: Address) extends ClusterMessage /** - * Represents the state of the cluster; cluster ring membership, ring convergence, meta data - all versioned by a vector clock. + * Command to leave the cluster. */ -case class Gossip( - self: Member, - // sorted set of members with their status, sorted by name - members: SortedSet[Member], - unavailableMembers: Set[Member] = Set.empty[Member], - // for ring convergence - seen: Map[Member, VectorClock] = Map.empty[Member, VectorClock], - // for handoff - //pendingChanges: Option[Vector[PendingPartitioningChange]] = None, - meta: Option[Map[String, Array[Byte]]] = None, - // vector clock version - version: VectorClock = VectorClock()) - extends ClusterMessage // is a serializable cluster message - with Versioned // has a vector clock as version +case class Leave(node: Address) extends ClusterMessage + +/** + * Command to mark node as temporay down. + */ +case class Down(node: Address) extends ClusterMessage + +/** + * Command to remove a node from the cluster immediately. + */ +case class Remove(node: Address) extends ClusterMessage /** * Represents the address and the current status of a cluster member node. @@ -81,25 +78,49 @@ object MemberStatus { case object Down extends MemberStatus } -// sealed trait PendingPartitioningStatus -// object PendingPartitioningStatus { -// case object Complete extends PendingPartitioningStatus -// case object Awaiting extends PendingPartitioningStatus +// sealed trait PartitioningStatus +// object PartitioningStatus { +// case object Complete extends PartitioningStatus +// case object Awaiting extends PartitioningStatus // } -// case class PendingPartitioningChange( -// owner: Address, -// nextOwner: Address, -// changes: Vector[VNodeMod], -// status: PendingPartitioningStatus) +// case class PartitioningChange( +// from: Address, +// to: Address, +// path: PartitionPath, +// status: PartitioningStatus) + +/** + * Represents the overview of the cluster, holds the cluster convergence table and unreachable nodes. + */ +case class GossipOverview( + seen: Map[Member, VectorClock] = Map.empty[Member, VectorClock], + unreachable: Set[Member] = Set.empty[Member]) + +/** + * Represents the state of the cluster; cluster ring membership, ring convergence, meta data - all versioned by a vector clock. + */ +case class Gossip( + overview: GossipOverview = GossipOverview(), + self: Member, + members: SortedSet[Member], // sorted set of members with their status, sorted by name + //partitions: Tree[PartitionPath, Node] = Tree.empty[PartitionPath, Node], + //pending: Set[PartitioningChange] = Set.empty[PartitioningChange], + meta: Map[String, Array[Byte]] = Map.empty[String, Array[Byte]], + version: VectorClock = VectorClock()) // vector clock version + extends ClusterMessage // is a serializable cluster message + with Versioned // has a vector clock as version final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { val log = Logging(system, "ClusterDaemon") def receive = { - case Join(address) ⇒ gossiper.joining(address) - case gossip: Gossip ⇒ gossiper.receive(gossip) - case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") + case gossip: Gossip ⇒ gossiper.receive(gossip) + case Join(address) ⇒ gossiper.joining(address) + case Leave(address) ⇒ //gossiper.leaving(address) + case Down(address) ⇒ //gossiper.downing(address) + case Remove(address) ⇒ //gossiper.removing(address) + case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } @@ -185,8 +206,11 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. */ def shutdown() { + + // FIXME Cheating. Can't just shut down. Node must first gossip an Leave command, wait for Leader to do proper Handoff and then await an Exit command before switching to Removed + if (isRunning.compareAndSet(true, false)) { - log.info("Node [{}] - Shutting down Gossiper", remoteAddress) + log.info("Node [{}] - Shutting down Gossiper and ClusterDaemon", remoteAddress) try connectionManager.shutdown() finally { try system.stop(clusterDaemon) finally { try gossipCanceller.cancel() finally { @@ -251,14 +275,14 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // val gossip = VectorClock.latestVersionOf(newGossip, oldState.latestGossip) // println("-------- WINNING VERSION " + gossip) - // val latestAvailableNodes = gossip.members - // val latestUnavailableNodes = gossip.unavailableMembers + // val latestMembers = gossip.members + // val latestUnreachableMembers = gossip.overview.unreachable // println("=======>>> myself: " + myself) - // println("=======>>> latestAvailableNodes: " + latestAvailableNodes) - // if (!(latestAvailableNodes contains myself) && !(latestUnavailableNodes contains myself)) { + // println("=======>>> latestMembers: " + latestMembers) + // if (!(latestMembers contains myself) && !(latestUnreachableMembers contains myself)) { // println("-------- NEW NODE") // // we have a new member - // val newGossip = gossip copy (availableNodes = latestAvailableNodes + myself) + // val newGossip = gossip copy (availableNodes = latestMembers + myself) // val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) // println("--------- new GOSSIP " + newGossip.members) @@ -268,19 +292,19 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // else { // println("---------- WON RACE - setting state") // // create connections for all new members in the latest gossip - // (latestAvailableNodes + myself) foreach { member ⇒ + // (latestMembers + myself) foreach { member ⇒ // setUpConnectionTo(member) // oldState.memberMembershipChangeListeners foreach (_ memberConnected member) // notify listeners about the new members // } // } - // } else if (latestUnavailableNodes contains myself) { + // } else if (latestUnreachableMembers contains myself) { // // gossip from an old former dead member - // val newUnavailableMembers = latestUnavailableNodes - myself - // val newMembers = latestAvailableNodes + myself + // val newUnreachableMembers = latestUnreachableMembers - myself + // val newMembers = latestMembers + myself - // val newGossip = gossip copy (availableNodes = newMembers, unavailableNodes = newUnavailableMembers) + // val newGossip = gossip copy (availableNodes = newMembers, unavailableNodes = newUnreachableMembers) // val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) // // if we won the race then update else try again @@ -342,18 +366,18 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val oldMembers = oldGossip.members val oldMembersSize = oldMembers.size - val oldUnavailableMembers = oldGossip.unavailableMembers - val oldUnavailableMembersSize = oldUnavailableMembers.size + val oldUnreachableMembers = oldGossip.overview.unreachable + val oldUnreachableSize = oldUnreachableMembers.size // 1. gossip to alive members val shouldGossipToDeputy = - if (oldUnavailableMembersSize > 0) gossipToRandomNodeOf(oldMembers) + if (oldUnreachableSize > 0) gossipToRandomNodeOf(oldMembers) else false // 2. gossip to dead members - if (oldUnavailableMembersSize > 0) { - val probability: Double = oldUnavailableMembersSize / (oldMembersSize + 1) - if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnavailableMembers) + if (oldUnreachableSize > 0) { + val probability: Double = oldUnreachableSize / (oldMembersSize + 1) + if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnreachableMembers) } // 3. gossip to a deputy nodes for facilitating partition healing @@ -361,7 +385,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if ((!shouldGossipToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) else { - val probability = 1.0 / oldMembersSize + oldUnavailableMembersSize + val probability = 1.0 / oldMembersSize + oldUnreachableSize if (random.nextDouble() <= probability) gossipToRandomNodeOf(deputies) } } @@ -429,15 +453,17 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val oldState = state.get if (!oldState.isSingletonCluster) { // do not scrutinize if we are a singleton cluster val oldGossip = oldState.latestGossip + val oldOverview = oldGossip.overview val oldMembers = oldGossip.members - val oldUnavailableMembers = oldGossip.unavailableMembers - val newlyDetectedUnavailableMembers = oldMembers filterNot (member ⇒ failureDetector.isAvailable(member.address)) + val oldUnreachableMembers = oldGossip.overview.unreachable + val newlyDetectedUnreachableMembers = oldMembers filterNot (member ⇒ failureDetector.isAvailable(member.address)) - if (!newlyDetectedUnavailableMembers.isEmpty) { // we have newly detected members marked as unavailable - val newMembers = oldMembers diff newlyDetectedUnavailableMembers - val newUnavailableMembers = oldUnavailableMembers ++ newlyDetectedUnavailableMembers + if (!newlyDetectedUnreachableMembers.isEmpty) { // we have newly detected members marked as unavailable + val newMembers = oldMembers diff newlyDetectedUnreachableMembers + val newUnreachableMembers = oldUnreachableMembers ++ newlyDetectedUnreachableMembers - val newGossip = oldGossip copy (members = newMembers, unavailableMembers = newUnavailableMembers) + val newOverview = oldOverview copy (unreachable = newUnreachableMembers) + val newGossip = oldGossip copy (overview = newOverview, members = newMembers) val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) // if we won the race then update else try again @@ -445,7 +471,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { else { // notify listeners on successful update of state for { - deadNode ← newUnavailableMembers + deadNode ← newUnreachableMembers listener ← oldState.memberMembershipChangeListeners } listener memberDisconnected deadNode } From 607eac90e352e7bad84f201c395658aba0ab033b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Thu, 9 Feb 2012 15:55:29 +0100 Subject: [PATCH 10/72] Added 'or' method to Versioned. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/VectorClock.scala | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala index d8d87db75b..13583cc120 100644 --- a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala +++ b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala @@ -13,12 +13,21 @@ class VectorClockException(message: String) extends AkkaException(message) */ trait Versioned { def version: VectorClock + + /** + * Returns the Versioned that have the latest version. + */ + def or(other: Versioned): Versioned = Versioned.latestVersionOf(this, other) } /** * Utility methods for comparing Versioned instances. */ object Versioned { + + /** + * Returns the Versioned that have the latest version. + */ def latestVersionOf[T <: Versioned](versioned1: T, versioned2: T): T = { (versioned1.version compare versioned2.version) match { case VectorClock.Before ⇒ versioned2 // version 1 is BEFORE (older), use version 2 From c0d18f807641972d5e46a38a8a0624e5bc43bd13 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Thu, 9 Feb 2012 15:59:10 +0100 Subject: [PATCH 11/72] Implemented 'receive(newGossip)' plus misc other changes and fixes. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Implemented 'receive(newGossip)' * Added GossipEnvelope * Added MetaDataChangeListener * Changed MembershipChangeListener API * Changed most internal API to work with Address rather than Member * Added builder style API to Gossip for changing it in an immutable way * Moved 'self: Member' from Gossip to State Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/Gossiper.scala | 241 +++++++++--------- .../scala/akka/cluster/NodeStartupSpec.scala | 2 +- 2 files changed, 118 insertions(+), 125 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 249546b0ad..783690a249 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -25,11 +25,19 @@ import scala.annotation.tailrec import com.google.protobuf.ByteString /** - * Interface for member membership change listener. + * Interface for membership change listener. */ -trait NodeMembershipChangeListener { - def memberConnected(member: Member) - def memberDisconnected(member: Member) +trait MembershipChangeListener { // FIXME add notification of MembershipChangeListener + def notify(members: SortedSet[Member]): Unit + // def memberConnected(member: Member): Unit + // def memberDisconnected(member: Member): Unit +} + +/** + * Interface for meta data change listener. + */ +trait MetaDataChangeListener { // FIXME add management and notification for MetaDataChangeListener + def notify(meta: Map[String, Array[Byte]]): Unit } // FIXME create Protobuf messages out of all the Gossip stuff - but wait until the prototol is fully stablized. @@ -64,6 +72,11 @@ case class Remove(node: Address) extends ClusterMessage */ case class Member(address: Address, status: MemberStatus) extends ClusterMessage +/** + * Envelope adding a sender address to the gossip. + */ +case class GossipEnvelope(sender: Member, gossip: Gossip) extends ClusterMessage + /** * Defines the current status of a cluster member node * @@ -94,33 +107,49 @@ object MemberStatus { * Represents the overview of the cluster, holds the cluster convergence table and unreachable nodes. */ case class GossipOverview( - seen: Map[Member, VectorClock] = Map.empty[Member, VectorClock], - unreachable: Set[Member] = Set.empty[Member]) + seen: Map[Address, VectorClock] = Map.empty[Address, VectorClock], + unreachable: Set[Address] = Set.empty[Address]) /** * Represents the state of the cluster; cluster ring membership, ring convergence, meta data - all versioned by a vector clock. */ case class Gossip( overview: GossipOverview = GossipOverview(), - self: Member, members: SortedSet[Member], // sorted set of members with their status, sorted by name //partitions: Tree[PartitionPath, Node] = Tree.empty[PartitionPath, Node], //pending: Set[PartitioningChange] = Set.empty[PartitioningChange], meta: Map[String, Array[Byte]] = Map.empty[String, Array[Byte]], version: VectorClock = VectorClock()) // vector clock version extends ClusterMessage // is a serializable cluster message - with Versioned // has a vector clock as version + with Versioned { + + def addMember(member: Member): Gossip = { + if (members contains member) this + else this copy (members = members + member) + } + + /** + * Marks the gossip as seen by this node (remoteAddress) by updating the address entry in the 'gossip.overview.seen' + * Map with the VectorClock for the new gossip. + */ + def markAsSeenByThisNode(address: Address): Gossip = + this copy (overview = overview copy (seen = overview.seen + (address -> version))) + + def incrementVersion(memberFingerprint: Int): Gossip = { + this copy (version = version.increment(memberFingerprint, newTimestamp)) + } +} final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { val log = Logging(system, "ClusterDaemon") def receive = { - case gossip: Gossip ⇒ gossiper.receive(gossip) - case Join(address) ⇒ gossiper.joining(address) - case Leave(address) ⇒ //gossiper.leaving(address) - case Down(address) ⇒ //gossiper.downing(address) - case Remove(address) ⇒ //gossiper.removing(address) - case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") + case GossipEnvelope(sender, gossip) ⇒ gossiper.receive(sender, gossip) + case Join(address) ⇒ gossiper.joining(address) + case Leave(address) ⇒ //gossiper.leaving(address) + case Down(address) ⇒ //gossiper.downing(address) + case Remove(address) ⇒ //gossiper.removing(address) + case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } @@ -146,9 +175,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * all state is represented by this immutable case class and managed by an AtomicReference. */ private case class State( + self: Member, latestGossip: Gossip, isSingletonCluster: Boolean = true, // starts as singleton cluster - memberMembershipChangeListeners: Set[NodeMembershipChangeListener] = Set.empty[NodeMembershipChangeListener]) + memberMembershipChangeListeners: Set[MembershipChangeListener] = Set.empty[MembershipChangeListener]) val remoteSettings = new RemoteSettings(system.settings.config, system.name) val clusterSettings = new ClusterSettings(system.settings.config, system.name) @@ -163,8 +193,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) - private val nodeToJoin: Option[Member] = - clusterSettings.NodeToJoin filter (_ != remoteAddress) map (address ⇒ Member(address, MemberStatus.Joining)) + private val nodeToJoin: Option[Address] = clusterSettings.NodeToJoin filter (_ != remoteAddress) private val serialization = remote.serialization private val failureDetector = new AccrualFailureDetector( @@ -179,8 +208,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val state = { val member = Member(remoteAddress, MemberStatus.Joining) - val gossip = Gossip(self = member, members = SortedSet.empty[Member] + member) - new AtomicReference[State](State(gossip)) + val gossip = Gossip(members = SortedSet.empty[Member] + member) + new AtomicReference[State](State(member, gossip)) } // FIXME manage connections in some other way so we can delete the RemoteConnectionManager (SINCE IT SUCKS!!!) @@ -190,8 +219,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // try to join the node defined in the 'akka.cluster.node-to-join' option nodeToJoin match { - case None ⇒ switchStatusTo(MemberStatus.Up) // if we are singleton cluster then we are already considered to be UP - case Some(member) ⇒ join(member) + case None ⇒ switchStatusTo(MemberStatus.Up) // if we are singleton cluster then we are already considered to be UP + case Some(address) ⇒ join(address) } // start periodic gossip and cluster scrutinization @@ -231,7 +260,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Member status for this node. */ - def self: Member = latestGossip.self + def self: Member = state.get.self /** * Is this node a singleton cluster? @@ -248,7 +277,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val oldGossip = oldState.latestGossip val oldMembers = oldGossip.members val newGossip = oldGossip copy (members = oldMembers + Member(node, MemberStatus.Joining)) // add joining node as Joining - val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + val newState = oldState copy (latestGossip = newGossip.incrementVersion(memberFingerprint)) // FIXME set flag state.isSingletonCluster = false (if true) @@ -258,66 +287,37 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Receive new gossip. */ - //@tailrec - final def receive(newGossip: Gossip) { - val from = newGossip.self - log.debug("Node [{}] - Receiving gossip from [{}]", remoteAddress, from.address) + @tailrec + final def receive(sender: Member, newGossip: Gossip) { + log.debug("Node [{}] - Receiving gossip from [{}]", remoteAddress, sender.address) - failureDetector heartbeat from.address // update heartbeat in failure detector + failureDetector heartbeat sender.address // update heartbeat in failure detector // FIXME set flag state.isSingletonCluster = false (if true) - // FIXME all below here is WRONG - redesign with cluster convergence in mind + // FIXME check for convergence - if we have convergence then trigger the listeners - // val oldState = state.get - // println("-------- NEW VERSION " + newGossip) - // println("-------- OLD VERSION " + oldState.latestGossip) - // val gossip = VectorClock.latestVersionOf(newGossip, oldState.latestGossip) - // println("-------- WINNING VERSION " + gossip) + val oldState = state.get + val oldGossip = oldState.latestGossip - // val latestMembers = gossip.members - // val latestUnreachableMembers = gossip.overview.unreachable - // println("=======>>> myself: " + myself) - // println("=======>>> latestMembers: " + latestMembers) - // if (!(latestMembers contains myself) && !(latestUnreachableMembers contains myself)) { - // println("-------- NEW NODE") - // // we have a new member - // val newGossip = gossip copy (availableNodes = latestMembers + myself) - // val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + val gossip = Versioned + .latestVersionOf(newGossip, oldGossip) + .addMember(self) // needed if newGossip won + .addMember(sender) // needed if oldGossip won + .markAsSeenByThisNode(remoteAddress) + .incrementVersion(memberFingerprint) - // println("--------- new GOSSIP " + newGossip.members) - // println("--------- new STATE " + newState) - // // if we won the race then update else try again - // if (!state.compareAndSet(oldState, newState)) receive(newGossip) // recur - // else { - // println("---------- WON RACE - setting state") - // // create connections for all new members in the latest gossip - // (latestMembers + myself) foreach { member ⇒ - // setUpConnectionTo(member) - // oldState.memberMembershipChangeListeners foreach (_ memberConnected member) // notify listeners about the new members - // } - // } + val newState = oldState copy (latestGossip = gossip) - // } else if (latestUnreachableMembers contains myself) { - // // gossip from an old former dead member - - // val newUnreachableMembers = latestUnreachableMembers - myself - // val newMembers = latestMembers + myself - - // val newGossip = gossip copy (availableNodes = newMembers, unavailableNodes = newUnreachableMembers) - // val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) - - // // if we won the race then update else try again - // if (!state.compareAndSet(oldState, newState)) receive(newGossip) // recur - // else oldState.memberMembershipChangeListeners foreach (_ memberConnected myself) // notify listeners on successful update of state - // } + // if we won the race then update else try again + if (!state.compareAndSet(oldState, newState)) receive(sender, newGossip) // recur if we fail the update } /** * Registers a listener to subscribe to cluster membership changes. */ @tailrec - final def registerListener(listener: NodeMembershipChangeListener) { + final def registerListener(listener: MembershipChangeListener) { val oldState = state.get val newListeners = oldState.memberMembershipChangeListeners + listener val newState = oldState copy (memberMembershipChangeListeners = newListeners) @@ -328,7 +328,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * Unsubscribes to cluster membership changes. */ @tailrec - final def unregisterListener(listener: NodeMembershipChangeListener) { + final def unregisterListener(listener: MembershipChangeListener) { val oldState = state.get val newListeners = oldState.memberMembershipChangeListeners - listener val newState = oldState copy (memberMembershipChangeListeners = newListeners) @@ -338,10 +338,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Joins the pre-configured contact point and retrieves current gossip state. */ - private def join(member: Member) { - setUpConnectionTo(member) foreach { connection ⇒ + private def join(address: Address) { + setUpConnectionTo(address) foreach { connection ⇒ val command = Join(remoteAddress) - log.info("Node [{}] - Sending [{}] to [{}]", remoteAddress, command, member.address) + log.info("Node [{}] - Sending [{}] to [{}]", remoteAddress, command, address) connection ! command } @@ -366,23 +366,23 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val oldMembers = oldGossip.members val oldMembersSize = oldMembers.size - val oldUnreachableMembers = oldGossip.overview.unreachable - val oldUnreachableSize = oldUnreachableMembers.size + val oldUnreachableAddresses = oldGossip.overview.unreachable + val oldUnreachableSize = oldUnreachableAddresses.size // 1. gossip to alive members - val shouldGossipToDeputy = - if (oldUnreachableSize > 0) gossipToRandomNodeOf(oldMembers) + val gossipedToDeputy = + if (oldUnreachableSize > 0) gossipToRandomNodeOf(oldMembers.toList map { _.address }) else false - // 2. gossip to dead members + // 2. gossip to unreachable members if (oldUnreachableSize > 0) { val probability: Double = oldUnreachableSize / (oldMembersSize + 1) - if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnreachableMembers) + if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnreachableAddresses.toList) } // 3. gossip to a deputy nodes for facilitating partition healing val deputies = deputyNodesWithoutMyself - if ((!shouldGossipToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { + if ((!gossipedToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) else { val probability = 1.0 / oldMembersSize + oldUnreachableSize @@ -392,13 +392,16 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } } + /** + * Switches the state in the FSM. + */ @tailrec final private def switchStatusTo(newStatus: MemberStatus) { log.info("Node [{}] - Switching membership status to [{}]", remoteAddress, newStatus) val oldState = state.get - val oldGossip = oldState.latestGossip + val oldSelf = oldState.self - val oldSelf = oldGossip.self + val oldGossip = oldState.latestGossip val oldMembers = oldGossip.members val newSelf = oldSelf copy (status = newStatus) @@ -410,16 +413,16 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // ugly crap to work around bug in scala colletions ('val ss: SortedSet[Member] = SortedSet.empty[Member] ++ aSet' does not compile) val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*) - val newGossip = oldGossip copy (self = newSelf, members = newMembersSortedSet) - val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + val newGossip = oldGossip copy (members = newMembersSortedSet) incrementVersion memberFingerprint + val newState = oldState copy (self = newSelf, latestGossip = newGossip) if (!state.compareAndSet(oldState, newState)) switchStatusTo(newStatus) // recur if we failed update } /** - * Gossips latest gossip to a member. + * Gossips latest gossip to an address. */ - private def gossipTo(member: Member) { - setUpConnectionTo(member) foreach { _ ! latestGossip } + private def gossipTo(address: Address) { + setUpConnectionTo(address) foreach { _ ! GossipEnvelope(self, latestGossip) } } /** @@ -427,8 +430,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * * @return 'true' if it gossiped to a "deputy" member. */ - private def gossipToRandomNodeOf(members: Seq[Member]): Boolean = { - val peers = members filter (_.address != remoteAddress) // filter out myself + private def gossipToRandomNodeOf(addresses: Seq[Address]): Boolean = { + val peers = addresses filter (_ != remoteAddress) // filter out myself val peer = selectRandomNode(peers) val oldState = state.get val oldGossip = oldState.latestGossip @@ -438,15 +441,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } /** - * Gossips to a random member in the set of members passed in as argument. - * - * @return 'true' if it gossiped to a "deputy" member. - */ - private def gossipToRandomNodeOf(members: Set[Member]): Boolean = gossipToRandomNodeOf(members.toList) - - /** - * Scrutinizes the cluster; marks members detected by the failure detector as unavailable, and notifies all listeners - * of the change in the cluster membership. + * Scrutinizes the cluster; marks members detected by the failure detector as unavailable. */ @tailrec final private def scrutinize() { @@ -455,25 +450,28 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val oldGossip = oldState.latestGossip val oldOverview = oldGossip.overview val oldMembers = oldGossip.members - val oldUnreachableMembers = oldGossip.overview.unreachable - val newlyDetectedUnreachableMembers = oldMembers filterNot (member ⇒ failureDetector.isAvailable(member.address)) + val oldUnreachableAddresses = oldGossip.overview.unreachable - if (!newlyDetectedUnreachableMembers.isEmpty) { // we have newly detected members marked as unavailable + val newlyDetectedUnreachableMembers = oldMembers filterNot { member ⇒ failureDetector.isAvailable(member.address) } + val newlyDetectedUnreachableAddresses = newlyDetectedUnreachableMembers map { _.address } + + if (!newlyDetectedUnreachableAddresses.isEmpty) { // we have newly detected members marked as unavailable val newMembers = oldMembers diff newlyDetectedUnreachableMembers - val newUnreachableMembers = oldUnreachableMembers ++ newlyDetectedUnreachableMembers + val newUnreachableAddresses: Set[Address] = (oldUnreachableAddresses ++ newlyDetectedUnreachableAddresses) - val newOverview = oldOverview copy (unreachable = newUnreachableMembers) - val newGossip = oldGossip copy (overview = newOverview, members = newMembers) - val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + val newOverview = oldOverview copy (unreachable = newUnreachableAddresses) + val newGossip = oldGossip copy (overview = newOverview, members = newMembers) incrementVersion memberFingerprint + val newState = oldState copy (latestGossip = newGossip) // if we won the race then update else try again if (!state.compareAndSet(oldState, newState)) scrutinize() // recur else { + // FIXME should only notify when there is a cluster convergence // notify listeners on successful update of state - for { - deadNode ← newUnreachableMembers - listener ← oldState.memberMembershipChangeListeners - } listener memberDisconnected deadNode + // for { + // deadNode ← newUnreachableAddresses + // listener ← oldState.memberMembershipChangeListeners + // } listener memberDisconnected deadNode } } } @@ -481,29 +479,28 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // FIXME should shuffle list randomly before start traversing to avoid connecting to some member on every member @tailrec - final private def connectToRandomNodeOf(members: Seq[Member]): ActorRef = { - members match { - case member :: rest ⇒ - setUpConnectionTo(member) match { + final private def connectToRandomNodeOf(addresses: Seq[Address]): ActorRef = { + addresses match { + case address :: rest ⇒ + setUpConnectionTo(address) match { case Some(connection) ⇒ connection case None ⇒ connectToRandomNodeOf(rest) // recur if } case Nil ⇒ throw new RemoteConnectionException( - "Could not establish connection to any of the members in the argument list") + "Could not establish connection to any of the addresses in the argument list") } } /** - * Sets up remote connections to all the members in the argument list. + * Sets up remote connections to all the addresses in the argument list. */ - private def setUpConnectionsTo(members: Seq[Member]): Seq[Option[ActorRef]] = members map { setUpConnectionTo(_) } + private def setUpConnectionsTo(addresses: Seq[Address]): Seq[Option[ActorRef]] = addresses map setUpConnectionTo /** * Sets up remote connection. */ - private def setUpConnectionTo(member: Member): Option[ActorRef] = { - val address = member.address + private def setUpConnectionTo(address: Address): Option[ActorRef] = { try { Some(connectionManager.putIfAbsent(address, () ⇒ system.actorFor(RootActorPath(address) / "system" / "cluster"))) } catch { @@ -511,11 +508,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } } - private def incrementVersionForGossip(from: Gossip): Gossip = { - from copy (version = from.version.increment(memberFingerprint, newTimestamp)) - } + private def deputyNodesWithoutMyself: Seq[Address] = Seq.empty[Address] filter (_ != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq - private def deputyNodesWithoutMyself: Seq[Member] = Seq.empty[Member] filter (_.address != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq - - private def selectRandomNode(members: Seq[Member]): Member = members(random.nextInt(members.size)) + private def selectRandomNode(addresses: Seq[Address]): Address = addresses(random nextInt addresses.size) } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index 32bf0bc6b5..de59541dfa 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -14,7 +14,7 @@ import com.typesafe.config._ class NodeStartupSpec extends AkkaSpec(""" akka { - loglevel = "DEBUG" + loglevel = "INFO" } """) with ImplicitSender { From 5b37037ed1e317f3264009929d537df0caed1b00 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 14 Feb 2012 20:40:32 +0100 Subject: [PATCH 12/72] Added NodeGossipingSpec for testing gossiping and cluster membership. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../akka/cluster/NodeGossipingSpec.scala | 141 ++++++++++++++++++ 1 file changed, 141 insertions(+) create mode 100644 akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala new file mode 100644 index 0000000000..a3cc492a23 --- /dev/null +++ b/akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala @@ -0,0 +1,141 @@ +/** + * Copyright (C) 2009-2011 Typesafe Inc. + */ +package akka.cluster + +import java.net.InetSocketAddress + +import akka.testkit._ +import akka.dispatch._ +import akka.actor._ +import akka.remote._ + +import com.typesafe.config._ + +class NodeGossipingSpec extends AkkaSpec(""" + akka { + loglevel = "DEBUG" + } + """) with ImplicitSender { + + var gossiper0: Gossiper = _ + var gossiper1: Gossiper = _ + var gossiper2: Gossiper = _ + + var node0: ActorSystemImpl = _ + var node1: ActorSystemImpl = _ + var node2: ActorSystemImpl = _ + + try { + "A set of connected cluster nodes" must { + "(when two nodes) start gossiping to each other so that both nodes gets the same gossip info" in { + node0 = ActorSystem("NodeGossipingSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] + gossiper0 = Gossiper(node0, remote0) + + node1 = ActorSystem("NodeGossipingSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5551 + } + cluster.node-to-join = "akka://NodeGossipingSpec@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] + gossiper1 = Gossiper(node1, remote1) + + Thread.sleep(5000) + + val members0 = gossiper0.latestGossip.members.toArray + members0.size must be(2) + members0(0).address.port.get must be(5550) + members0(0).status must be(MemberStatus.Joining) + members0(1).address.port.get must be(5551) + members0(1).status must be(MemberStatus.Joining) + + val members1 = gossiper1.latestGossip.members.toArray + members1.size must be(2) + members1(0).address.port.get must be(5550) + members1(0).status must be(MemberStatus.Joining) + members1(1).address.port.get must be(5551) + members1(1).status must be(MemberStatus.Joining) + } + + "(when three nodes) start gossiping to each other so that both nodes gets the same gossip info" in { + node2 = ActorSystem("NodeGossipingSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5552 + } + cluster.node-to-join = "akka://NodeGossipingSpec@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] + gossiper2 = Gossiper(node2, remote2) + + Thread.sleep(10000) + + val members0 = gossiper0.latestGossip.members.toArray + val version = gossiper0.latestGossip.version + members0.size must be(3) + members0(0).address.port.get must be(5550) + members0(0).status must be(MemberStatus.Joining) + members0(1).address.port.get must be(5551) + members0(1).status must be(MemberStatus.Joining) + members0(2).address.port.get must be(5552) + members0(2).status must be(MemberStatus.Joining) + + val members1 = gossiper1.latestGossip.members.toArray + members1.size must be(3) + members1(0).address.port.get must be(5550) + members1(0).status must be(MemberStatus.Joining) + members1(1).address.port.get must be(5551) + members1(1).status must be(MemberStatus.Joining) + members1(2).address.port.get must be(5552) + members1(2).status must be(MemberStatus.Joining) + + val members2 = gossiper2.latestGossip.members.toArray + members2.size must be(3) + members2(0).address.port.get must be(5550) + members2(0).status must be(MemberStatus.Joining) + members2(1).address.port.get must be(5551) + members2(1).status must be(MemberStatus.Joining) + members2(2).address.port.get must be(5552) + members2(2).status must be(MemberStatus.Joining) + } + } + } catch { + case e: Exception ⇒ + e.printStackTrace + fail(e.toString) + } + + override def atTermination() { + gossiper0.shutdown() + node0.shutdown() + + gossiper1.shutdown() + node1.shutdown() + + gossiper2.shutdown() + node2.shutdown() + } +} From 6779cf06abf9b4e63bc4f20c9586294f50415c86 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 14 Feb 2012 20:42:03 +0100 Subject: [PATCH 13/72] Rewrite of the VectorClock impl. Now with 'merge' support and slicker API. Also added more tests. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/VectorClock.scala | 246 +++++++++------ .../scala/akka/cluster/VectorClockSpec.scala | 290 +++++++++++------- 2 files changed, 331 insertions(+), 205 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala index 13583cc120..e27215b23c 100644 --- a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala +++ b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala @@ -5,19 +5,21 @@ package akka.cluster import akka.AkkaException +import akka.event.Logging +import akka.actor.ActorSystem + +import System.{ currentTimeMillis ⇒ newTimestamp } +import java.security.MessageDigest +import java.util.concurrent.atomic.AtomicLong class VectorClockException(message: String) extends AkkaException(message) /** * Trait to be extended by classes that wants to be versioned using a VectorClock. */ -trait Versioned { +trait Versioned[T] { def version: VectorClock - - /** - * Returns the Versioned that have the latest version. - */ - def or(other: Versioned): Versioned = Versioned.latestVersionOf(this, other) + def +(node: VectorClock.Node): T } /** @@ -25,14 +27,104 @@ trait Versioned { */ object Versioned { + /** + * The result of comparing two Versioned objects. + * Either: + * {{{ + * 1) v1 is BEFORE v2 => Before + * 2) v1 is AFTER t2 => After + * 3) v1 happens CONCURRENTLY to v2 => Concurrent + * }}} + */ + sealed trait Ordering + case object Before extends Ordering + case object After extends Ordering + case object Concurrent extends Ordering + + /** + * Returns or 'Ordering' for the two 'Versioned' instances. + */ + def compare[T <: Versioned[T]](versioned1: Versioned[T], versioned2: Versioned[T]): Ordering = { + if (versioned1.version <> versioned2.version) Concurrent + else if (versioned1.version < versioned2.version) Before + else After + } + /** * Returns the Versioned that have the latest version. */ - def latestVersionOf[T <: Versioned](versioned1: T, versioned2: T): T = { - (versioned1.version compare versioned2.version) match { - case VectorClock.Before ⇒ versioned2 // version 1 is BEFORE (older), use version 2 - case VectorClock.After ⇒ versioned1 // version 1 is AFTER (newer), use version 1 - case VectorClock.Concurrent ⇒ versioned1 // can't establish a causal relationship between versions => conflict - keeping version 1 + def latestVersionOf[T <: Versioned[T]](versioned1: T, versioned2: T): T = { + compare(versioned1, versioned2) match { + case Concurrent ⇒ versioned2 + case Before ⇒ versioned2 + case After ⇒ versioned1 + } + } +} + +/** + * VectorClock module with helper classes and methods. + * + * Based on code from the 'vlock' VectorClock library by Coda Hale. + */ +object VectorClock { + + /** + * Hash representation of a versioned node name. + */ + class Node private (val name: String) extends Serializable { + override def hashCode = 0 + name.## + + override def equals(other: Any) = Node.unapply(this) == Node.unapply(other) + + override def toString = name.mkString("Node(", "", ")") + } + + object Node { + def apply(name: String): Node = new Node(hash(name)) + + def unapply(other: Any) = other match { + case x: Node ⇒ import x._; Some(name) + case _ ⇒ None + } + + private def hash(name: String): String = { + val digester = MessageDigest.getInstance("MD5") + digester update name.getBytes + digester.digest.map { h ⇒ "%02x".format(0xFF & h) }.mkString + } + } + + /** + * Timestamp representation a unique 'Ordered' timestamp. + */ + case class Timestamp private (time: Long) extends Ordered[Timestamp] { + def max(other: Timestamp) = { + if (this < other) other + else this + } + + def compare(other: Timestamp) = time compare other.time + + override def toString = "%016x" format time + } + + object Timestamp { + private val counter = new AtomicLong(newTimestamp) + + def zero(): Timestamp = Timestamp(0L) + + def apply(): Timestamp = { + var newTime: Long = 0L + while (newTime == 0) { + val last = counter.get + val current = newTimestamp + val next = if (current > last) current else last + 1 + if (counter.compareAndSet(last, next)) { + newTime = next + } + } + new Timestamp(newTime) } } } @@ -44,108 +136,68 @@ object Versioned { * 1) Leslie Lamport (1978). "Time, clocks, and the ordering of events in a distributed system". Communications of the ACM 21 (7): 558-565. * 2) Friedemann Mattern (1988). "Virtual Time and Global States of Distributed Systems". Workshop on Parallel and Distributed Algorithms: pp. 215-226 * }}} + * + * Based on code from the 'vlock' VectorClock library by Coda Hale. */ case class VectorClock( - versions: Vector[VectorClock.Entry] = Vector.empty[VectorClock.Entry], - timestamp: Long = System.currentTimeMillis) { + timestamp: VectorClock.Timestamp = VectorClock.Timestamp(), + versions: Map[VectorClock.Node, VectorClock.Timestamp] = Map.empty[VectorClock.Node, VectorClock.Timestamp]) + extends PartiallyOrdered[VectorClock] { + + // FIXME pruning of VectorClock history + import VectorClock._ - def compare(other: VectorClock): Ordering = VectorClock.compare(this, other) - - def increment(fingerprint: Int, timestamp: Long): VectorClock = { - val newVersions = - if (versions exists (entry ⇒ entry.fingerprint == fingerprint)) { - // update existing node entry - versions map { entry ⇒ - if (entry.fingerprint == fingerprint) entry.increment() - else entry - } - } else { - // create and append a new node entry - versions :+ Entry(fingerprint = fingerprint) - } - if (newVersions.size > MaxNrOfVersions) throw new VectorClockException("Max number of versions reached") - copy(versions = newVersions, timestamp = timestamp) - } - - def maxVersion: Long = versions.foldLeft(1L)((max, entry) ⇒ math.max(max, entry.version)) - - // FIXME Do we need to implement VectorClock.merge? - def merge(other: VectorClock): VectorClock = { - sys.error("Not implemented") - } -} - -/** - * Module with helper classes and methods. - */ -object VectorClock { - final val MaxNrOfVersions = Short.MaxValue - /** - * The result of comparing two vector clocks. - * Either: - * {{{ - * 1) v1 is BEFORE v2 - * 2) v1 is AFTER t2 - * 3) v1 happens CONCURRENTLY to v2 - * }}} + * Increment the version for the node passed as argument. Returns a new VectorClock. */ - sealed trait Ordering - case object Before extends Ordering - case object After extends Ordering - case object Concurrent extends Ordering + def +(node: Node): VectorClock = copy(versions = versions + (node -> Timestamp())) /** - * Versioned entry in a vector clock. + * Returns true if this and that are concurrent else false. */ - case class Entry(fingerprint: Int, version: Long = 1L) { - def increment(): Entry = copy(version = version + 1L) - } + def <>(that: VectorClock): Boolean = tryCompareTo(that) == None /** + * Returns true if this VectorClock has the same history as the 'that' VectorClock else false. + */ + def ==(that: VectorClock): Boolean = versions == that.versions + + /** + * For the 'PartiallyOrdered' trait, to allow natural comparisons using <, > and ==. + *

* Compare two vector clocks. The outcomes will be one of the following: *

* {{{ - * 1. Clock 1 is BEFORE clock 2 if there exists an i such that c1(i) <= c(2) and there does not exist a j such that c1(j) > c2(j). - * 2. Clock 1 is CONCURRENT to clock 2 if there exists an i, j such that c1(i) < c2(i) and c1(j) > c2(j). - * 3. Clock 1 is AFTER clock 2 otherwise. + * 1. Clock 1 is BEFORE (>) Clock 2 if there exists an i such that c1(i) <= c(2) and there does not exist a j such that c1(j) > c2(j). + * 2. Clock 1 is CONCURRENT (<>) to Clock 2 if there exists an i, j such that c1(i) < c2(i) and c1(j) > c2(j). + * 3. Clock 1 is AFTER (<) Clock 2 otherwise. * }}} - * - * @param v1 The first VectorClock - * @param v2 The second VectorClock */ - def compare(v1: VectorClock, v2: VectorClock): Ordering = { - if ((v1 eq null) || (v2 eq null)) throw new IllegalArgumentException("Can't compare null VectorClocks") - - // FIXME rewrite to functional style, now uses ugly imperative algorithm - - var v1Bigger, v2Bigger = false // We do two checks: v1 <= v2 and v2 <= v1 if both are true then - var p1, p2 = 0 - - while (p1 < v1.versions.size && p2 < v2.versions.size) { - val ver1 = v1.versions(p1) - val ver2 = v2.versions(p2) - if (ver1.fingerprint == ver2.fingerprint) { - if (ver1.version > ver2.version) v1Bigger = true - else if (ver2.version > ver1.version) v2Bigger = true - p1 += 1 - p2 += 1 - } else if (ver1.fingerprint > ver2.fingerprint) { - v2Bigger = true // Since ver1 is bigger that means it is missing a version that ver2 has - p2 += 1 - } else { - v1Bigger = true // This means ver2 is bigger which means it is missing a version ver1 has - p1 += 1 - } + def tryCompareTo[V >: VectorClock <% PartiallyOrdered[V]](vclock: V): Option[Int] = { + def compare(versions1: Map[Node, Timestamp], versions2: Map[Node, Timestamp]): Boolean = { + versions1.forall { case ((n, t)) ⇒ t <= versions2.getOrElse(n, Timestamp.zero) } && + (versions1.exists { case ((n, t)) ⇒ t < versions2.getOrElse(n, Timestamp.zero) } || + (versions1.size < versions2.size)) + } + vclock match { + case VectorClock(_, otherVersions) ⇒ + if (compare(versions, otherVersions)) Some(-1) + else if (compare(otherVersions, versions)) Some(1) + else if (versions == otherVersions) Some(0) + else None + case _ ⇒ None } - - if (p1 < v1.versions.size) v1Bigger = true - else if (p2 < v2.versions.size) v2Bigger = true - - if (!v1Bigger && !v2Bigger) Before // This is the case where they are equal, return BEFORE arbitrarily - else if (v1Bigger && !v2Bigger) After // This is the case where v1 is a successor clock to v2 - else if (!v1Bigger && v2Bigger) Before // This is the case where v2 is a successor clock to v1 - else Concurrent // This is the case where both clocks are parallel to one another } + + /** + * Merges this VectorClock with another VectorClock. E.g. merges its versioned history. + */ + def merge(that: VectorClock): VectorClock = { + val mergedVersions = scala.collection.mutable.Map.empty[Node, Timestamp] ++ that.versions + for ((node, time) ← versions) mergedVersions(node) = time max mergedVersions.getOrElse(node, time) + VectorClock(timestamp, Map.empty[Node, Timestamp] ++ mergedVersions) + } + + override def toString = versions.map { case ((n, t)) ⇒ n + " -> " + t }.mkString("VectorClock(", ", ", ")") } diff --git a/akka-cluster/src/test/scala/akka/cluster/VectorClockSpec.scala b/akka-cluster/src/test/scala/akka/cluster/VectorClockSpec.scala index df9cead7f8..65f2aa1d75 100644 --- a/akka-cluster/src/test/scala/akka/cluster/VectorClockSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/VectorClockSpec.scala @@ -2,6 +2,7 @@ package akka.cluster import java.net.InetSocketAddress import akka.testkit.AkkaSpec +import akka.actor.ActorSystem class VectorClockSpec extends AkkaSpec { import VectorClock._ @@ -10,193 +11,266 @@ class VectorClockSpec extends AkkaSpec { "have zero versions when created" in { val clock = VectorClock() - clock.versions must be(Vector()) + clock.versions must be(Map()) } - "be able to add Entry if non-existing" in { - val clock1 = VectorClock() - clock1.versions must be(Vector()) - val clock2 = clock1.increment(1, System.currentTimeMillis) - val clock3 = clock2.increment(2, System.currentTimeMillis) - - clock3.versions must be(Vector(Entry(1, 1), Entry(2, 1))) - } - - "be able to increment version of existing Entry" in { - val clock1 = VectorClock() - val clock2 = clock1.increment(1, System.currentTimeMillis) - val clock3 = clock2.increment(2, System.currentTimeMillis) - val clock4 = clock3.increment(1, System.currentTimeMillis) - val clock5 = clock4.increment(2, System.currentTimeMillis) - val clock6 = clock5.increment(2, System.currentTimeMillis) - - clock6.versions must be(Vector(Entry(1, 2), Entry(2, 3))) - } - - "The empty clock should not happen before itself" in { + "not happen before itself" in { val clock1 = VectorClock() val clock2 = VectorClock() - clock1.compare(clock2) must not be (Concurrent) + clock1 <> clock2 must be(false) } - "not happen before an identical clock" in { + "pass misc comparison test 1" in { val clock1_1 = VectorClock() - val clock2_1 = clock1_1.increment(1, System.currentTimeMillis) - val clock3_1 = clock2_1.increment(2, System.currentTimeMillis) - val clock4_1 = clock3_1.increment(1, System.currentTimeMillis) + val clock2_1 = clock1_1 + Node("1") + val clock3_1 = clock2_1 + Node("2") + val clock4_1 = clock3_1 + Node("1") val clock1_2 = VectorClock() - val clock2_2 = clock1_2.increment(1, System.currentTimeMillis) - val clock3_2 = clock2_2.increment(2, System.currentTimeMillis) - val clock4_2 = clock3_2.increment(1, System.currentTimeMillis) + val clock2_2 = clock1_2 + Node("1") + val clock3_2 = clock2_2 + Node("2") + val clock4_2 = clock3_2 + Node("1") - clock4_1.compare(clock4_2) must not be (Concurrent) + clock4_1 <> clock4_2 must be(false) } - "happen before an identical clock with a single additional event" in { + "pass misc comparison test 2" in { val clock1_1 = VectorClock() - val clock2_1 = clock1_1.increment(1, System.currentTimeMillis) - val clock3_1 = clock2_1.increment(2, System.currentTimeMillis) - val clock4_1 = clock3_1.increment(1, System.currentTimeMillis) + val clock2_1 = clock1_1 + Node("1") + val clock3_1 = clock2_1 + Node("2") + val clock4_1 = clock3_1 + Node("1") val clock1_2 = VectorClock() - val clock2_2 = clock1_2.increment(1, System.currentTimeMillis) - val clock3_2 = clock2_2.increment(2, System.currentTimeMillis) - val clock4_2 = clock3_2.increment(1, System.currentTimeMillis) - val clock5_2 = clock4_2.increment(3, System.currentTimeMillis) + val clock2_2 = clock1_2 + Node("1") + val clock3_2 = clock2_2 + Node("2") + val clock4_2 = clock3_2 + Node("1") + val clock5_2 = clock4_2 + Node("3") - clock4_1.compare(clock5_2) must be(Before) + clock4_1 < clock5_2 must be(true) } - "Two clocks with different events should be concurrent: 1" in { + "pass misc comparison test 3" in { var clock1_1 = VectorClock() - val clock2_1 = clock1_1.increment(1, System.currentTimeMillis) + val clock2_1 = clock1_1 + Node("1") val clock1_2 = VectorClock() - val clock2_2 = clock1_2.increment(2, System.currentTimeMillis) + val clock2_2 = clock1_2 + Node("2") - clock2_1.compare(clock2_2) must be(Concurrent) + clock2_1 <> clock2_2 must be(true) } - "Two clocks with different events should be concurrent: 2" in { + "pass misc comparison test 4" in { val clock1_3 = VectorClock() - val clock2_3 = clock1_3.increment(1, System.currentTimeMillis) - val clock3_3 = clock2_3.increment(2, System.currentTimeMillis) - val clock4_3 = clock3_3.increment(1, System.currentTimeMillis) + val clock2_3 = clock1_3 + Node("1") + val clock3_3 = clock2_3 + Node("2") + val clock4_3 = clock3_3 + Node("1") val clock1_4 = VectorClock() - val clock2_4 = clock1_4.increment(1, System.currentTimeMillis) - val clock3_4 = clock2_4.increment(1, System.currentTimeMillis) - val clock4_4 = clock3_4.increment(3, System.currentTimeMillis) + val clock2_4 = clock1_4 + Node("1") + val clock3_4 = clock2_4 + Node("1") + val clock4_4 = clock3_4 + Node("3") - clock4_3.compare(clock4_4) must be(Concurrent) + clock4_3 <> clock4_4 must be(true) } - ".." in { + "pass misc comparison test 5" in { val clock1_1 = VectorClock() - val clock2_1 = clock1_1.increment(2, System.currentTimeMillis) - val clock3_1 = clock2_1.increment(2, System.currentTimeMillis) + val clock2_1 = clock1_1 + Node("2") + val clock3_1 = clock2_1 + Node("2") val clock1_2 = VectorClock() - val clock2_2 = clock1_2.increment(1, System.currentTimeMillis) - val clock3_2 = clock2_2.increment(2, System.currentTimeMillis) - val clock4_2 = clock3_2.increment(2, System.currentTimeMillis) - val clock5_2 = clock4_2.increment(3, System.currentTimeMillis) + val clock2_2 = clock1_2 + Node("1") + val clock3_2 = clock2_2 + Node("2") + val clock4_2 = clock3_2 + Node("2") + val clock5_2 = clock4_2 + Node("3") - clock3_1.compare(clock5_2) must be(Before) + clock3_1 < clock5_2 must be(true) + clock5_2 > clock3_1 must be(true) } - "..." in { + "pass misc comparison test 6" in { val clock1_1 = VectorClock() - val clock2_1 = clock1_1.increment(1, System.currentTimeMillis) - val clock3_1 = clock2_1.increment(2, System.currentTimeMillis) - val clock4_1 = clock3_1.increment(2, System.currentTimeMillis) - val clock5_1 = clock4_1.increment(3, System.currentTimeMillis) + val clock2_1 = clock1_1 + Node("1") + val clock3_1 = clock2_1 + Node("2") val clock1_2 = VectorClock() - val clock2_2 = clock1_2.increment(2, System.currentTimeMillis) - val clock3_2 = clock2_2.increment(2, System.currentTimeMillis) + val clock2_2 = clock1_2 + Node("1") + val clock3_2 = clock2_2 + Node("1") - clock5_1.compare(clock3_2) must be(After) + clock3_1 <> clock3_2 must be(true) + clock3_2 <> clock3_1 must be(true) + } + + "pass misc comparison test 7" in { + val clock1_1 = VectorClock() + val clock2_1 = clock1_1 + Node("1") + val clock3_1 = clock2_1 + Node("2") + val clock4_1 = clock3_1 + Node("2") + val clock5_1 = clock4_1 + Node("3") + + val clock1_2 = VectorClock() + val clock2_2 = clock1_2 + Node("2") + val clock3_2 = clock2_2 + Node("2") + + clock5_1 <> clock3_2 must be(true) + clock3_2 <> clock5_1 must be(true) + } + + "correctly merge two clocks" in { + val node1 = Node("1") + val node2 = Node("2") + val node3 = Node("3") + + val clock1_1 = VectorClock() + val clock2_1 = clock1_1 + node1 + val clock3_1 = clock2_1 + node2 + val clock4_1 = clock3_1 + node2 + val clock5_1 = clock4_1 + node3 + + val clock1_2 = VectorClock() + val clock2_2 = clock1_2 + node2 + val clock3_2 = clock2_2 + node2 + + val merged1 = clock3_2 merge clock5_1 + merged1.versions.size must be(3) + merged1.versions.contains(node1) must be(true) + merged1.versions.contains(node2) must be(true) + merged1.versions.contains(node3) must be(true) + + val merged2 = clock5_1 merge clock3_2 + merged2.versions.size must be(3) + merged2.versions.contains(node1) must be(true) + merged2.versions.contains(node2) must be(true) + merged2.versions.contains(node3) must be(true) + + clock3_2 < merged1 must be(true) + clock5_1 < merged1 must be(true) + + clock3_2 < merged2 must be(true) + clock5_1 < merged2 must be(true) + + merged1 == merged2 must be(true) + } + + "pass blank clock incrementing" in { + val node1 = Node("1") + val node2 = Node("2") + val node3 = Node("3") + + val v1 = VectorClock() + val v2 = VectorClock() + + val vv1 = v1 + node1 + val vv2 = v2 + node2 + + (vv1 > v1) must equal(true) + (vv2 > v2) must equal(true) + + (vv1 > v2) must equal(true) + (vv2 > v1) must equal(true) + + (vv2 > vv1) must equal(false) + (vv1 > vv2) must equal(false) + } + + "pass merging behavior" in { + val node1 = Node("1") + val node2 = Node("2") + val node3 = Node("3") + + val a = VectorClock() + val b = VectorClock() + + val a1 = a + node1 + val b1 = b + node2 + + var a2 = a1 + node1 + var c = a2.merge(b1) + var c1 = c + node3 + + (c1 > a2) must equal(true) + (c1 > b1) must equal(true) } } - "A Versioned" must { - class TestVersioned(val version: VectorClock = VectorClock()) extends Versioned { - def increment(v: Int, time: Long) = new TestVersioned(version.increment(v, time)) + "An instance of Versioned" must { + class TestVersioned(val version: VectorClock = VectorClock()) extends Versioned[TestVersioned] { + def +(node: Node): TestVersioned = new TestVersioned(version + node) } + import Versioned.latestVersionOf + "have zero versions when created" in { val versioned = new TestVersioned() - versioned.version.versions must be(Vector()) + versioned.version.versions must be(Map()) } "happen before an identical versioned with a single additional event" in { val versioned1_1 = new TestVersioned() - val versioned2_1 = versioned1_1.increment(1, System.currentTimeMillis) - val versioned3_1 = versioned2_1.increment(2, System.currentTimeMillis) - val versioned4_1 = versioned3_1.increment(1, System.currentTimeMillis) + val versioned2_1 = versioned1_1 + Node("1") + val versioned3_1 = versioned2_1 + Node("2") + val versioned4_1 = versioned3_1 + Node("1") val versioned1_2 = new TestVersioned() - val versioned2_2 = versioned1_2.increment(1, System.currentTimeMillis) - val versioned3_2 = versioned2_2.increment(2, System.currentTimeMillis) - val versioned4_2 = versioned3_2.increment(1, System.currentTimeMillis) - val versioned5_2 = versioned4_2.increment(3, System.currentTimeMillis) + val versioned2_2 = versioned1_2 + Node("1") + val versioned3_2 = versioned2_2 + Node("2") + val versioned4_2 = versioned3_2 + Node("1") + val versioned5_2 = versioned4_2 + Node("3") - Versioned.latestVersionOf[TestVersioned](versioned4_1, versioned5_2) must be(versioned5_2) + latestVersionOf[TestVersioned](versioned4_1, versioned5_2) must be(versioned5_2) } - "Two versioneds with different events should be concurrent: 1" in { + "pass misc comparison test 1" in { var versioned1_1 = new TestVersioned() - val versioned2_1 = versioned1_1.increment(1, System.currentTimeMillis) + val versioned2_1 = versioned1_1 + Node("1") val versioned1_2 = new TestVersioned() - val versioned2_2 = versioned1_2.increment(2, System.currentTimeMillis) + val versioned2_2 = versioned1_2 + Node("2") - Versioned.latestVersionOf[TestVersioned](versioned2_1, versioned2_2) must be(versioned2_1) + latestVersionOf[TestVersioned](versioned2_1, versioned2_2) must be(versioned2_2) } - "Two versioneds with different events should be concurrent: 2" in { + "pass misc comparison test 2" in { val versioned1_3 = new TestVersioned() - val versioned2_3 = versioned1_3.increment(1, System.currentTimeMillis) - val versioned3_3 = versioned2_3.increment(2, System.currentTimeMillis) - val versioned4_3 = versioned3_3.increment(1, System.currentTimeMillis) + val versioned2_3 = versioned1_3 + Node("1") + val versioned3_3 = versioned2_3 + Node("2") + val versioned4_3 = versioned3_3 + Node("1") val versioned1_4 = new TestVersioned() - val versioned2_4 = versioned1_4.increment(1, System.currentTimeMillis) - val versioned3_4 = versioned2_4.increment(1, System.currentTimeMillis) - val versioned4_4 = versioned3_4.increment(3, System.currentTimeMillis) + val versioned2_4 = versioned1_4 + Node("1") + val versioned3_4 = versioned2_4 + Node("1") + val versioned4_4 = versioned3_4 + Node("3") - Versioned.latestVersionOf[TestVersioned](versioned4_3, versioned4_4) must be(versioned4_3) + latestVersionOf[TestVersioned](versioned4_3, versioned4_4) must be(versioned4_4) } - "be earlier than another versioned if it has an older version" in { + "pass misc comparison test 3" in { val versioned1_1 = new TestVersioned() - val versioned2_1 = versioned1_1.increment(2, System.currentTimeMillis) - val versioned3_1 = versioned2_1.increment(2, System.currentTimeMillis) + val versioned2_1 = versioned1_1 + Node("2") + val versioned3_1 = versioned2_1 + Node("2") val versioned1_2 = new TestVersioned() - val versioned2_2 = versioned1_2.increment(1, System.currentTimeMillis) - val versioned3_2 = versioned2_2.increment(2, System.currentTimeMillis) - val versioned4_2 = versioned3_2.increment(2, System.currentTimeMillis) - val versioned5_2 = versioned4_2.increment(3, System.currentTimeMillis) + val versioned2_2 = versioned1_2 + Node("1") + val versioned3_2 = versioned2_2 + Node("2") + val versioned4_2 = versioned3_2 + Node("2") + val versioned5_2 = versioned4_2 + Node("3") - Versioned.latestVersionOf[TestVersioned](versioned3_1, versioned5_2) must be(versioned5_2) + latestVersionOf[TestVersioned](versioned3_1, versioned5_2) must be(versioned5_2) } - "be later than another versioned if it has an newer version" in { + "pass misc comparison test 4" in { val versioned1_1 = new TestVersioned() - val versioned2_1 = versioned1_1.increment(1, System.currentTimeMillis) - val versioned3_1 = versioned2_1.increment(2, System.currentTimeMillis) - val versioned4_1 = versioned3_1.increment(2, System.currentTimeMillis) - val versioned5_1 = versioned4_1.increment(3, System.currentTimeMillis) + val versioned2_1 = versioned1_1 + Node("1") + val versioned3_1 = versioned2_1 + Node("2") + val versioned4_1 = versioned3_1 + Node("2") + val versioned5_1 = versioned4_1 + Node("3") val versioned1_2 = new TestVersioned() - val versioned2_2 = versioned1_2.increment(2, System.currentTimeMillis) - val versioned3_2 = versioned2_2.increment(2, System.currentTimeMillis) + val versioned2_2 = versioned1_2 + Node("2") + val versioned3_2 = versioned2_2 + Node("2") - Versioned.latestVersionOf[TestVersioned](versioned5_1, versioned3_2) must be(versioned5_1) + latestVersionOf[TestVersioned](versioned5_1, versioned3_2) must be(versioned3_2) } } } From 678c03d52638deed5bd15d1ee10163f6aadd6bb5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 14 Feb 2012 20:50:12 +0100 Subject: [PATCH 14/72] Finalized initial cluster membership and merging of vector clocks and gossips in case of concurrent cluster updates. Plus misc other fixes. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Finalized initial cluster membership. * Added merging of vector clocks and gossips in case of concurrent cluster updates. * Added toString methods to all cluster protocol classes * Fixed bugs in incrementation of vector clocks * Added updates of 'seen' table for cluster convergence * Revamped to use new VectorClock impl * Refactored Gossip.State Signed-off-by: Jonas Bonér --- .../akka/cluster/AccrualFailureDetector.scala | 4 +- .../main/scala/akka/cluster/Gossiper.scala | 309 +++++++++++------- 2 files changed, 195 insertions(+), 118 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala index 379bf98a6b..cebc518bcf 100644 --- a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala +++ b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala @@ -145,7 +145,9 @@ class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val ma val mean = oldState.failureStats.get(connection).getOrElse(FailureStats()).mean PhiFactor * timestampDiff / mean } - log.debug("Phi value [{}] and threshold [{}] for connection [{}] ", phi, threshold, connection) + + // FIXME sometimes we get "Phi value [Infinity]" fix it + if (phi > 0.0) log.debug("Phi value [{}] and threshold [{}] for connection [{}] ", phi, threshold, connection) // only log if PHI value is starting to get interesting phi } diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 783690a249..480d6c7461 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -17,7 +17,6 @@ import java.util.concurrent.atomic.{ AtomicReference, AtomicBoolean } import java.util.concurrent.TimeUnit._ import java.util.concurrent.TimeoutException import java.security.SecureRandom -import System.{ currentTimeMillis ⇒ newTimestamp } import scala.collection.immutable.{ Map, SortedSet } import scala.annotation.tailrec @@ -104,11 +103,17 @@ object MemberStatus { // status: PartitioningStatus) /** - * Represents the overview of the cluster, holds the cluster convergence table and unreachable nodes. + * Represents the overview of the cluster, holds the cluster convergence table and set with unreachable nodes. */ case class GossipOverview( seen: Map[Address, VectorClock] = Map.empty[Address, VectorClock], - unreachable: Set[Address] = Set.empty[Address]) + unreachable: Set[Address] = Set.empty[Address]) { + + override def toString = + "GossipOverview(seen = [" + seen.mkString(", ") + + "], unreachable = [" + unreachable.mkString(", ") + + "])" +} /** * Represents the state of the cluster; cluster ring membership, ring convergence, meta data - all versioned by a vector clock. @@ -121,9 +126,14 @@ case class Gossip( meta: Map[String, Array[Byte]] = Map.empty[String, Array[Byte]], version: VectorClock = VectorClock()) // vector clock version extends ClusterMessage // is a serializable cluster message - with Versioned { + with Versioned[Gossip] { - def addMember(member: Member): Gossip = { + /** + * Increments the version for this 'Node'. + */ + def +(node: VectorClock.Node): Gossip = copy(version = version + node) + + def +(member: Member): Gossip = { if (members contains member) this else this copy (members = members + member) } @@ -132,14 +142,19 @@ case class Gossip( * Marks the gossip as seen by this node (remoteAddress) by updating the address entry in the 'gossip.overview.seen' * Map with the VectorClock for the new gossip. */ - def markAsSeenByThisNode(address: Address): Gossip = + def seen(address: Address): Gossip = this copy (overview = overview copy (seen = overview.seen + (address -> version))) - def incrementVersion(memberFingerprint: Int): Gossip = { - this copy (version = version.increment(memberFingerprint, newTimestamp)) - } + override def toString = + "Gossip(" + + "overview = " + overview + + ", members = [" + members.mkString(", ") + + "], meta = [" + meta.mkString(", ") + + "], version = " + version + + ")" } +// FIXME add FSM trait? final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { val log = Logging(system, "ClusterDaemon") @@ -153,6 +168,9 @@ final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor } } +// FIXME Cluster public API should be an Extension +// FIXME Add cluster Node class and refactor out all non-gossip related stuff out of Gossiper + /** * This module is responsible for Gossiping cluster information. The abstraction maintains the list of live * and dead members. Periodically i.e. every 1 second this module chooses a random member and initiates a round @@ -177,19 +195,18 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private case class State( self: Member, latestGossip: Gossip, - isSingletonCluster: Boolean = true, // starts as singleton cluster memberMembershipChangeListeners: Set[MembershipChangeListener] = Set.empty[MembershipChangeListener]) val remoteSettings = new RemoteSettings(system.settings.config, system.name) val clusterSettings = new ClusterSettings(system.settings.config, system.name) val remoteAddress = remote.transport.address - val memberFingerprint = remoteAddress.## + val selfNode = VectorClock.Node(remoteAddress.toString) val gossipInitialDelay = clusterSettings.GossipInitialDelay val gossipFrequency = clusterSettings.GossipFrequency - implicit val memberOrdering = Ordering.fromLessThan[Member](_.address.toString > _.address.toString) + implicit val memberOrdering = Ordering.fromLessThan[Member](_.address.toString < _.address.toString) implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) @@ -204,53 +221,38 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val random = SecureRandom.getInstance("SHA1PRNG") // Is it right to put this guy under the /system path or should we have a top-level /cluster or something else...? + // FIXME should be defined as a router so we get concurrency here private val clusterDaemon = system.systemActorOf(Props(new ClusterDaemon(system, this)), "cluster") private val state = { val member = Member(remoteAddress, MemberStatus.Joining) - val gossip = Gossip(members = SortedSet.empty[Member] + member) + val gossip = Gossip(members = SortedSet.empty[Member] + member) + selfNode // add me as member and update my vector clock new AtomicReference[State](State(member, gossip)) } // FIXME manage connections in some other way so we can delete the RemoteConnectionManager (SINCE IT SUCKS!!!) private val connectionManager = new RemoteConnectionManager(system, remote, failureDetector, Map.empty[Address, ActorRef]) + import Versioned.latestVersionOf + log.info("Node [{}] - Starting cluster Gossiper...", remoteAddress) // try to join the node defined in the 'akka.cluster.node-to-join' option - nodeToJoin match { - case None ⇒ switchStatusTo(MemberStatus.Up) // if we are singleton cluster then we are already considered to be UP - case Some(address) ⇒ join(address) - } + nodeToJoin foreach join - // start periodic gossip and cluster scrutinization + // start periodic gossip to random nodes in cluster val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { gossip() } + + // start periodic cluster scrutinization (moving nodes condemned by the failure detector to unreachable list) val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { scrutinize() } - /** - * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. - */ - def shutdown() { - - // FIXME Cheating. Can't just shut down. Node must first gossip an Leave command, wait for Leader to do proper Handoff and then await an Exit command before switching to Removed - - if (isRunning.compareAndSet(true, false)) { - log.info("Node [{}] - Shutting down Gossiper and ClusterDaemon", remoteAddress) - try connectionManager.shutdown() finally { - try system.stop(clusterDaemon) finally { - try gossipCanceller.cancel() finally { - try scrutinizeCanceller.cancel() finally { - log.info("Node [{}] - Gossiper is shut down", remoteAddress) - } - } - } - } - } - } + // ====================================================== + // ===================== PUBLIC API ===================== + // ====================================================== /** * Latest gossip. @@ -265,52 +267,90 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Is this node a singleton cluster? */ - def isSingletonCluster: Boolean = state.get.isSingletonCluster + def isSingletonCluster: Boolean = isSingletonCluster(state.get) + + /** + * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. + */ + def shutdown() { + + // FIXME Cheating for now. Can't just shut down. Node must first gossip an Leave command, wait for Leader to do proper Handoff and then await an Exit command before switching to Removed + + if (isRunning.compareAndSet(true, false)) { + log.info("Node [{}] - Shutting down Gossiper and ClusterDaemon...", remoteAddress) + + try connectionManager.shutdown() finally { + try system.stop(clusterDaemon) finally { + try gossipCanceller.cancel() finally { + try scrutinizeCanceller.cancel() finally { + log.info("Node [{}] - Gossiper and ClusterDaemon shut down successfully", remoteAddress) + } + } + } + } + } + } /** * New node joining. */ @tailrec final def joining(node: Address) { - log.debug("Node [{}] - Node [{}] is joining", remoteAddress, node) - val oldState = state.get - val oldGossip = oldState.latestGossip - val oldMembers = oldGossip.members - val newGossip = oldGossip copy (members = oldMembers + Member(node, MemberStatus.Joining)) // add joining node as Joining - val newState = oldState copy (latestGossip = newGossip.incrementVersion(memberFingerprint)) + log.info("Node [{}] - Node [{}] is joining", remoteAddress, node) - // FIXME set flag state.isSingletonCluster = false (if true) + val localState = state.get + val localGossip = localState.latestGossip + val localMembers = localGossip.members - if (!state.compareAndSet(oldState, newState)) joining(node) // recur if we failed update + val newMembers = localMembers + Member(node, MemberStatus.Joining) // add joining node as Joining + val newGossip = localGossip copy (members = newMembers) + + val versionedGossip = newGossip + selfNode + val seenVersionedGossip = versionedGossip seen remoteAddress + + val newState = localState copy (latestGossip = seenVersionedGossip) + + if (!state.compareAndSet(localState, newState)) joining(node) // recur if we failed update } /** * Receive new gossip. */ @tailrec - final def receive(sender: Member, newGossip: Gossip) { + final def receive(sender: Member, remoteGossip: Gossip) { log.debug("Node [{}] - Receiving gossip from [{}]", remoteAddress, sender.address) failureDetector heartbeat sender.address // update heartbeat in failure detector - // FIXME set flag state.isSingletonCluster = false (if true) - // FIXME check for convergence - if we have convergence then trigger the listeners - val oldState = state.get - val oldGossip = oldState.latestGossip + val localState = state.get + val localGossip = localState.latestGossip - val gossip = Versioned - .latestVersionOf(newGossip, oldGossip) - .addMember(self) // needed if newGossip won - .addMember(sender) // needed if oldGossip won - .markAsSeenByThisNode(remoteAddress) - .incrementVersion(memberFingerprint) + val winningGossip = + if (remoteGossip.version <> localGossip.version) { + // concurrent + val mergedGossip = merge(remoteGossip, localGossip) + val versionedMergedGossip = mergedGossip + selfNode - val newState = oldState copy (latestGossip = gossip) + log.debug("Can't establish a causal relationship between \"remote\" gossip [{}] and \"local\" gossip [{}] - merging them into [{}]", + remoteGossip, localGossip, versionedMergedGossip) + + versionedMergedGossip + + } else if (remoteGossip.version < localGossip.version) { + // local gossip is newer + localGossip + + } else { + // remote gossip is newer + remoteGossip + } + + val newState = localState copy (latestGossip = winningGossip seen remoteAddress) // if we won the race then update else try again - if (!state.compareAndSet(oldState, newState)) receive(sender, newGossip) // recur if we fail the update + if (!state.compareAndSet(localState, newState)) receive(sender, remoteGossip) // recur if we fail the update } /** @@ -318,10 +358,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ @tailrec final def registerListener(listener: MembershipChangeListener) { - val oldState = state.get - val newListeners = oldState.memberMembershipChangeListeners + listener - val newState = oldState copy (memberMembershipChangeListeners = newListeners) - if (!state.compareAndSet(oldState, newState)) registerListener(listener) // recur + val localState = state.get + val newListeners = localState.memberMembershipChangeListeners + listener + val newState = localState copy (memberMembershipChangeListeners = newListeners) + if (!state.compareAndSet(localState, newState)) registerListener(listener) // recur } /** @@ -329,12 +369,16 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ @tailrec final def unregisterListener(listener: MembershipChangeListener) { - val oldState = state.get - val newListeners = oldState.memberMembershipChangeListeners - listener - val newState = oldState copy (memberMembershipChangeListeners = newListeners) - if (!state.compareAndSet(oldState, newState)) unregisterListener(listener) // recur + val localState = state.get + val newListeners = localState.memberMembershipChangeListeners - listener + val newState = localState copy (memberMembershipChangeListeners = newListeners) + if (!state.compareAndSet(localState, newState)) unregisterListener(listener) // recur } + // ======================================================== + // ===================== INTERNAL API ===================== + // ======================================================== + /** * Joins the pre-configured contact point and retrieves current gossip state. */ @@ -360,69 +404,90 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * Initates a new round of gossip. */ private def gossip() { - val oldState = state.get - if (!oldState.isSingletonCluster) { // do not gossip if we are a singleton cluster - val oldGossip = oldState.latestGossip - val oldMembers = oldGossip.members - val oldMembersSize = oldMembers.size + val localState = state.get + val localGossip = localState.latestGossip + val localMembers = localGossip.members - val oldUnreachableAddresses = oldGossip.overview.unreachable - val oldUnreachableSize = oldUnreachableAddresses.size + if (!isSingletonCluster(localState)) { // do not gossip if we are a singleton cluster + log.debug("Node [{}] - Initiating new round of gossip", remoteAddress) + + val localGossip = localState.latestGossip + val localMembers = localGossip.members + val localMembersSize = localMembers.size + + val localUnreachableAddresses = localGossip.overview.unreachable + val localUnreachableSize = localUnreachableAddresses.size // 1. gossip to alive members - val gossipedToDeputy = - if (oldUnreachableSize > 0) gossipToRandomNodeOf(oldMembers.toList map { _.address }) - else false + val gossipedToDeputy = gossipToRandomNodeOf(localMembers.toList map { _.address }) // 2. gossip to unreachable members - if (oldUnreachableSize > 0) { - val probability: Double = oldUnreachableSize / (oldMembersSize + 1) - if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnreachableAddresses.toList) + if (localUnreachableSize > 0) { + val probability: Double = localUnreachableSize / (localMembersSize + 1) + if (random.nextDouble() < probability) gossipToRandomNodeOf(localUnreachableAddresses.toList) } // 3. gossip to a deputy nodes for facilitating partition healing val deputies = deputyNodesWithoutMyself - if ((!gossipedToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { - if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) + if ((!gossipedToDeputy || localMembersSize < 1) && !deputies.isEmpty) { + if (localMembersSize == 0) gossipToRandomNodeOf(deputies) else { - val probability = 1.0 / oldMembersSize + oldUnreachableSize + val probability = 1.0 / localMembersSize + localUnreachableSize if (random.nextDouble() <= probability) gossipToRandomNodeOf(deputies) } } } } + /** + * Merges two Gossip instances including membership tables, meta-data tables and the VectorClock histories. + */ + private def merge(gossip1: Gossip, gossip2: Gossip): Gossip = { + val mergedVClock = gossip1.version merge gossip2.version + val mergedMembers = gossip1.members union gossip2.members + val mergedMeta = gossip1.meta ++ gossip2.meta + Gossip(gossip2.overview, mergedMembers, mergedMeta, mergedVClock) + } + /** * Switches the state in the FSM. */ @tailrec final private def switchStatusTo(newStatus: MemberStatus) { log.info("Node [{}] - Switching membership status to [{}]", remoteAddress, newStatus) - val oldState = state.get - val oldSelf = oldState.self - val oldGossip = oldState.latestGossip - val oldMembers = oldGossip.members + val localState = state.get + val localSelf = localState.self - val newSelf = oldSelf copy (status = newStatus) + val localGossip = localState.latestGossip + val localMembers = localGossip.members - val newMembersSet = oldMembers map { member ⇒ + val newSelf = localSelf copy (status = newStatus) + val newMembersSet = localMembers map { member ⇒ if (member.address == remoteAddress) newSelf else member } + // ugly crap to work around bug in scala colletions ('val ss: SortedSet[Member] = SortedSet.empty[Member] ++ aSet' does not compile) val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*) + val newGossip = localGossip copy (members = newMembersSortedSet) - val newGossip = oldGossip copy (members = newMembersSortedSet) incrementVersion memberFingerprint - val newState = oldState copy (self = newSelf, latestGossip = newGossip) - if (!state.compareAndSet(oldState, newState)) switchStatusTo(newStatus) // recur if we failed update + val versionedGossip = newGossip + selfNode + val seenVersionedGossip = versionedGossip seen remoteAddress + + val newState = localState copy (self = newSelf, latestGossip = seenVersionedGossip) + + if (!state.compareAndSet(localState, newState)) switchStatusTo(newStatus) // recur if we failed update } /** * Gossips latest gossip to an address. */ private def gossipTo(address: Address) { - setUpConnectionTo(address) foreach { _ ! GossipEnvelope(self, latestGossip) } + setUpConnectionTo(address) foreach { connection ⇒ + log.debug("Node [{}] - Gossiping to [{}]", remoteAddress, address) + connection ! GossipEnvelope(self, latestGossip) + } } /** @@ -433,8 +498,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private def gossipToRandomNodeOf(addresses: Seq[Address]): Boolean = { val peers = addresses filter (_ != remoteAddress) // filter out myself val peer = selectRandomNode(peers) - val oldState = state.get - val oldGossip = oldState.latestGossip + val localState = state.get + val localGossip = localState.latestGossip // if connection can't be established/found => ignore it since the failure detector will take care of the potential problem gossipTo(peer) deputyNodesWithoutMyself exists (peer == _) @@ -445,32 +510,39 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ @tailrec final private def scrutinize() { - val oldState = state.get - if (!oldState.isSingletonCluster) { // do not scrutinize if we are a singleton cluster - val oldGossip = oldState.latestGossip - val oldOverview = oldGossip.overview - val oldMembers = oldGossip.members - val oldUnreachableAddresses = oldGossip.overview.unreachable + val localState = state.get - val newlyDetectedUnreachableMembers = oldMembers filterNot { member ⇒ failureDetector.isAvailable(member.address) } + if (!isSingletonCluster(localState)) { // do not scrutinize if we are a singleton cluster + + val localGossip = localState.latestGossip + val localOverview = localGossip.overview + val localMembers = localGossip.members + val localUnreachableAddresses = localGossip.overview.unreachable + + val newlyDetectedUnreachableMembers = localMembers filterNot { member ⇒ failureDetector.isAvailable(member.address) } val newlyDetectedUnreachableAddresses = newlyDetectedUnreachableMembers map { _.address } if (!newlyDetectedUnreachableAddresses.isEmpty) { // we have newly detected members marked as unavailable - val newMembers = oldMembers diff newlyDetectedUnreachableMembers - val newUnreachableAddresses: Set[Address] = (oldUnreachableAddresses ++ newlyDetectedUnreachableAddresses) - val newOverview = oldOverview copy (unreachable = newUnreachableAddresses) - val newGossip = oldGossip copy (overview = newOverview, members = newMembers) incrementVersion memberFingerprint - val newState = oldState copy (latestGossip = newGossip) + val newMembers = localMembers diff newlyDetectedUnreachableMembers + val newUnreachableAddresses: Set[Address] = (localUnreachableAddresses ++ newlyDetectedUnreachableAddresses) + + val newOverview = localOverview copy (unreachable = newUnreachableAddresses) + val newGossip = localGossip copy (overview = newOverview, members = newMembers) + + val versionedGossip = newGossip + selfNode + val seenVersionedGossip = versionedGossip seen remoteAddress + + val newState = localState copy (latestGossip = seenVersionedGossip) // if we won the race then update else try again - if (!state.compareAndSet(oldState, newState)) scrutinize() // recur + if (!state.compareAndSet(localState, newState)) scrutinize() // recur else { // FIXME should only notify when there is a cluster convergence // notify listeners on successful update of state // for { // deadNode ← newUnreachableAddresses - // listener ← oldState.memberMembershipChangeListeners + // listener ← localState.memberMembershipChangeListeners // } listener memberDisconnected deadNode } } @@ -481,14 +553,16 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { @tailrec final private def connectToRandomNodeOf(addresses: Seq[Address]): ActorRef = { addresses match { + case address :: rest ⇒ setUpConnectionTo(address) match { case Some(connection) ⇒ connection - case None ⇒ connectToRandomNodeOf(rest) // recur if + case None ⇒ connectToRandomNodeOf(rest) // recur - if we could not set up a connection - try next address } + case Nil ⇒ throw new RemoteConnectionException( - "Could not establish connection to any of the addresses in the argument list") + "Could not establish connection to any of the addresses in the argument list [" + addresses.mkString(", ") + "]") } } @@ -500,15 +574,16 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Sets up remote connection. */ - private def setUpConnectionTo(address: Address): Option[ActorRef] = { - try { - Some(connectionManager.putIfAbsent(address, () ⇒ system.actorFor(RootActorPath(address) / "system" / "cluster"))) - } catch { - case e: Exception ⇒ None + private def setUpConnectionTo(address: Address): Option[ActorRef] = Option { + // FIXME no need for using a factory here - remove connectionManager + try connectionManager.putIfAbsent(address, () ⇒ system.actorFor(RootActorPath(address) / "system" / "cluster")) catch { + case e: Exception ⇒ null } } private def deputyNodesWithoutMyself: Seq[Address] = Seq.empty[Address] filter (_ != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq private def selectRandomNode(addresses: Seq[Address]): Address = addresses(random nextInt addresses.size) + + private def isSingletonCluster(currentState: State): Boolean = currentState.latestGossip.members.size == 1 } From 510d7788842ad274fd76d43342db39bd6a269748 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 14 Feb 2012 20:53:46 +0100 Subject: [PATCH 15/72] Renamed NodeGossipingSpec to NodeMembershipSpec since it is testing consistency of the cluster node membership table MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- ...eGossipingSpec.scala => NodeMembershipSpec.scala} | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) rename akka-cluster/src/test/scala/akka/cluster/{NodeGossipingSpec.scala => NodeMembershipSpec.scala} (91%) diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala similarity index 91% rename from akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala rename to akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index a3cc492a23..f25a7d60a7 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -12,7 +12,7 @@ import akka.remote._ import com.typesafe.config._ -class NodeGossipingSpec extends AkkaSpec(""" +class NodeMembershipSpec extends AkkaSpec(""" akka { loglevel = "DEBUG" } @@ -29,7 +29,7 @@ class NodeGossipingSpec extends AkkaSpec(""" try { "A set of connected cluster nodes" must { "(when two nodes) start gossiping to each other so that both nodes gets the same gossip info" in { - node0 = ActorSystem("NodeGossipingSpec", ConfigFactory + node0 = ActorSystem("NodeMembershipSpec", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -43,7 +43,7 @@ class NodeGossipingSpec extends AkkaSpec(""" val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] gossiper0 = Gossiper(node0, remote0) - node1 = ActorSystem("NodeGossipingSpec", ConfigFactory + node1 = ActorSystem("NodeMembershipSpec", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -51,7 +51,7 @@ class NodeGossipingSpec extends AkkaSpec(""" hostname = localhost port=5551 } - cluster.node-to-join = "akka://NodeGossipingSpec@localhost:5550" + cluster.node-to-join = "akka://NodeMembershipSpec@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] @@ -76,7 +76,7 @@ class NodeGossipingSpec extends AkkaSpec(""" } "(when three nodes) start gossiping to each other so that both nodes gets the same gossip info" in { - node2 = ActorSystem("NodeGossipingSpec", ConfigFactory + node2 = ActorSystem("NodeMembershipSpec", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -84,7 +84,7 @@ class NodeGossipingSpec extends AkkaSpec(""" hostname = localhost port=5552 } - cluster.node-to-join = "akka://NodeGossipingSpec@localhost:5550" + cluster.node-to-join = "akka://NodeMembershipSpec@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] From 43cafb0d2e273ea47ee23a4a2003f88edab70310 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 14 Feb 2012 22:01:59 +0100 Subject: [PATCH 16/72] Disabling out erroneous cluster 'scrutinize' service until fixed and proper tests are written. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- akka-cluster/src/main/scala/akka/cluster/Gossiper.scala | 4 +++- .../src/test/scala/akka/cluster/NodeStartupSpec.scala | 3 ++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 480d6c7461..7dfb65b193 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -247,7 +247,9 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // start periodic cluster scrutinization (moving nodes condemned by the failure detector to unreachable list) val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { - scrutinize() + + // FIXME fix problems with 'scrutinize' + //scrutinize() } // ====================================================== diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index de59541dfa..1f5ff985db 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -47,7 +47,7 @@ class NodeStartupSpec extends AkkaSpec(""" val members = gossiper0.latestGossip.members val joiningMember = members find (_.address.port.get == 5550) joiningMember must be('defined) - joiningMember.get.status must be(MemberStatus.Up) + joiningMember.get.status must be(MemberStatus.Joining) } } @@ -84,6 +84,7 @@ class NodeStartupSpec extends AkkaSpec(""" override def atTermination() { gossiper0.shutdown() node0.shutdown() + gossiper1.shutdown() node1.shutdown() } From b7a6a648abd0b3e0c7c932ac06d66c248a302bd1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 15 Feb 2012 15:51:27 +0100 Subject: [PATCH 17/72] Fixed bug in failure detector which also fixes bug in cluster scrutinize service. Also added test case for the bug. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/AccrualFailureDetector.scala | 7 ++++--- akka-cluster/src/main/scala/akka/cluster/Gossiper.scala | 6 ++---- .../scala/akka/cluster/AccrualFailureDetectorSpec.scala | 9 ++++++++- 3 files changed, 14 insertions(+), 8 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala index cebc518bcf..8ee9f857a0 100644 --- a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala +++ b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala @@ -143,11 +143,12 @@ class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val ma else { val timestampDiff = newTimestamp - oldTimestamp.get val mean = oldState.failureStats.get(connection).getOrElse(FailureStats()).mean - PhiFactor * timestampDiff / mean + if (mean == 0.0D) 0.0D + else PhiFactor * timestampDiff / mean } - // FIXME sometimes we get "Phi value [Infinity]" fix it - if (phi > 0.0) log.debug("Phi value [{}] and threshold [{}] for connection [{}] ", phi, threshold, connection) // only log if PHI value is starting to get interesting + // only log if PHI value is starting to get interesting + if (phi > 0.0D) log.debug("Phi value [{}] and threshold [{}] for connection [{}] ", phi, threshold, connection) phi } diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 7dfb65b193..f37c9294de 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -247,9 +247,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // start periodic cluster scrutinization (moving nodes condemned by the failure detector to unreachable list) val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { - - // FIXME fix problems with 'scrutinize' - //scrutinize() + scrutinize() } // ====================================================== @@ -527,7 +525,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (!newlyDetectedUnreachableAddresses.isEmpty) { // we have newly detected members marked as unavailable val newMembers = localMembers diff newlyDetectedUnreachableMembers - val newUnreachableAddresses: Set[Address] = (localUnreachableAddresses ++ newlyDetectedUnreachableAddresses) + val newUnreachableAddresses: Set[Address] = localUnreachableAddresses ++ newlyDetectedUnreachableAddresses val newOverview = localOverview copy (unreachable = newUnreachableAddresses) val newGossip = localGossip copy (overview = newOverview, members = newMembers) diff --git a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala index 4aab105273..5f93a8ddf2 100644 --- a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala @@ -9,7 +9,14 @@ class AccrualFailureDetectorSpec extends AkkaSpec(""" """) { "An AccrualFailureDetector" must { - val conn = Address("akka", "", "localhost", 2552) + val conn = Address("akka", "", Some("localhost"), Some(2552)) + val conn2 = Address("akka", "", Some("localhost"), Some(2553)) + + "return phi value of 0.0D on startup for each address" in { + val fd = new AccrualFailureDetector(system) + fd.phi(conn) must be(0.0D) + fd.phi(conn2) must be(0.0D) + } "mark node as available after a series of successful heartbeats" in { val fd = new AccrualFailureDetector(system) From fc030763a638879efeacc02cbe08c10dc278d094 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Thu, 16 Feb 2012 11:20:51 +0100 Subject: [PATCH 18/72] Merged with master --- .../src/main/scala/akka/cluster/Gossiper.scala | 16 +++------------- .../cluster/AccrualFailureDetectorSpec.scala | 4 ++-- 2 files changed, 5 insertions(+), 15 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index f37c9294de..bb1e19e746 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -238,7 +238,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { log.info("Node [{}] - Starting cluster Gossiper...", remoteAddress) // try to join the node defined in the 'akka.cluster.node-to-join' option - nodeToJoin foreach join + join() // start periodic gossip to random nodes in cluster val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { @@ -382,22 +382,12 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Joins the pre-configured contact point and retrieves current gossip state. */ - private def join(address: Address) { + private def join() = nodeToJoin foreach { address ⇒ setUpConnectionTo(address) foreach { connection ⇒ val command = Join(remoteAddress) - log.info("Node [{}] - Sending [{}] to [{}]", remoteAddress, command, address) + log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, address, connection) connection ! command } - - contactPoint match { - case None ⇒ log.info("Booting up in singleton cluster mode") - case Some(member) ⇒ - log.info("Trying to join contact point node defined in the configuration [{}]", member) - setUpConnectionTo(member) match { - case None ⇒ log.error("Could not set up connection to join contact point node defined in the configuration [{}]", member) - case Some(connection) ⇒ tryJoinContactPoint(connection, deadline) - } - } } /** diff --git a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala index 5f93a8ddf2..034f582e0d 100644 --- a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala @@ -9,8 +9,8 @@ class AccrualFailureDetectorSpec extends AkkaSpec(""" """) { "An AccrualFailureDetector" must { - val conn = Address("akka", "", Some("localhost"), Some(2552)) - val conn2 = Address("akka", "", Some("localhost"), Some(2553)) + val conn = Address("akka", "", "localhost", 2552) + val conn2 = Address("akka", "", "localhost", 2553) "return phi value of 0.0D on startup for each address" in { val fd = new AccrualFailureDetector(system) From 5dd19c0f3456948f14cd920c130d8a629a60235a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Thu, 16 Feb 2012 14:48:40 +0100 Subject: [PATCH 19/72] Fixed error in merge. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../scala/akka/cluster/NodeStartupSpec.scala | 59 ++++++++++--------- 1 file changed, 30 insertions(+), 29 deletions(-) diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index 1f5ff985db..6ccb8491c1 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -24,22 +24,23 @@ class NodeStartupSpec extends AkkaSpec(""" var node1: ActorSystemImpl = _ try { - node0 = ActorSystem("NodeStartupSpec", ConfigFactory - .parseString(""" - akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5550 - } - }""") - .withFallback(system.settings.config)) - .asInstanceOf[ActorSystemImpl] - val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] - gossiper0 = Gossiper(node0, remote0) - "A first cluster node with a 'node-to-join' config set to empty string (singleton cluster)" must { + node0 = ActorSystem("NodeStartupSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] + gossiper0 = Gossiper(node0, remote0) + "be a singleton cluster when started up" in { + Thread.sleep(1000) gossiper0.isSingletonCluster must be(true) } @@ -51,23 +52,23 @@ class NodeStartupSpec extends AkkaSpec(""" } } - node1 = ActorSystem("NodeStartupSpec", ConfigFactory - .parseString(""" - akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5551 - } - cluster.node-to-join = "akka://NodeStartupSpec@localhost:5550" - }""") - .withFallback(system.settings.config)) - .asInstanceOf[ActorSystemImpl] - val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] - gossiper1 = Gossiper(node1, remote1) - "A second cluster node with a 'node-to-join' config defined" must { "join the other node cluster as 'Joining' when sending a Join command" in { + node1 = ActorSystem("NodeStartupSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5551 + } + cluster.node-to-join = "akka://NodeStartupSpec@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] + gossiper1 = Gossiper(node1, remote1) + Thread.sleep(1000) // give enough time for node1 to JOIN node0 val members = gossiper0.latestGossip.members val joiningMember = members find (_.address.port.get == 5551) From c37012c23b444a507d0cbc91a33bbbf6773e28d9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sat, 18 Feb 2012 17:40:58 +0100 Subject: [PATCH 20/72] Removed printed stack trace from remote client/server errors. Just annoying when client hangs retrying and does not provide any real value since they are the same every time. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- akka-actor/src/main/scala/akka/actor/ActorCell.scala | 2 +- .../src/main/scala/akka/actor/ActorRefProvider.scala | 10 +++++----- .../src/main/scala/akka/remote/RemoteTransport.scala | 10 +++++----- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/akka-actor/src/main/scala/akka/actor/ActorCell.scala b/akka-actor/src/main/scala/akka/actor/ActorCell.scala index a5331e9f49..5bafed3881 100644 --- a/akka-actor/src/main/scala/akka/actor/ActorCell.scala +++ b/akka-actor/src/main/scala/akka/actor/ActorCell.scala @@ -286,7 +286,7 @@ private[akka] class ActorCell( final def start(): Unit = { /* - * Create the mailbox and enqueue the Create() message to ensure that + * Create the mailbox and enqueue the Create() message to ensure that * this is processed before anything else. */ mailbox = dispatcher.createMailbox(this) diff --git a/akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala b/akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala index a176c6271d..cfa1a75ca3 100644 --- a/akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala +++ b/akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala @@ -501,30 +501,30 @@ class LocalActorRefProvider( def actorFor(ref: InternalActorRef, path: String): InternalActorRef = path match { case RelativeActorPath(elems) ⇒ if (elems.isEmpty) { - log.debug("look-up of empty path string '{}' fails (per definition)", path) + log.debug("look-up of empty path string [{}] fails (per definition)", path) deadLetters } else if (elems.head.isEmpty) actorFor(rootGuardian, elems.tail) else actorFor(ref, elems) case ActorPathExtractor(address, elems) if address == rootPath.address ⇒ actorFor(rootGuardian, elems) case _ ⇒ - log.debug("look-up of unknown path '{}' failed", path) + log.warning("look-up of unknown path [{}] failed", path) deadLetters } def actorFor(path: ActorPath): InternalActorRef = if (path.root == rootPath) actorFor(rootGuardian, path.elements) else { - log.debug("look-up of foreign ActorPath '{}' failed", path) + log.warning("look-up of foreign ActorPath [{}] failed", path) deadLetters } def actorFor(ref: InternalActorRef, path: Iterable[String]): InternalActorRef = if (path.isEmpty) { - log.debug("look-up of empty path sequence fails (per definition)") + log.warning("look-up of empty path sequence fails (per definition)") deadLetters } else ref.getChild(path.iterator) match { case Nobody ⇒ - log.debug("look-up of path sequence '{}' failed", path) + log.warning("look-up of path sequence [{}] failed", path) new EmptyLocalActorRef(system.provider, ref.path / path, eventStream) case x ⇒ x } diff --git a/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala b/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala index 256451bc0a..ff4644e081 100644 --- a/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala +++ b/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala @@ -32,7 +32,7 @@ case class RemoteClientError( @BeanProperty remoteAddress: Address) extends RemoteClientLifeCycleEvent { override def logLevel = Logging.ErrorLevel override def toString = - "RemoteClientError@" + remoteAddress + ": Error[" + AkkaException.toStringWithStackTrace(cause) + "]" + "RemoteClientError@" + remoteAddress + ": Error[" + cause + "]" } case class RemoteClientDisconnected( @@ -76,7 +76,7 @@ case class RemoteClientWriteFailed( override def toString = "RemoteClientWriteFailed@" + remoteAddress + ": MessageClass[" + (if (request ne null) request.getClass.getName else "no message") + - "] Error[" + AkkaException.toStringWithStackTrace(cause) + "]" + "] Error[" + cause + "]" } /** @@ -103,7 +103,7 @@ case class RemoteServerError( @BeanProperty remote: RemoteTransport) extends RemoteServerLifeCycleEvent { override def logLevel = Logging.ErrorLevel override def toString = - "RemoteServerError@" + remote + "] Error[" + AkkaException.toStringWithStackTrace(cause) + "]" + "RemoteServerError@" + remote + "] Error[" + cause + "]" } case class RemoteServerClientConnected( @@ -143,7 +143,7 @@ case class RemoteServerWriteFailed( "RemoteServerWriteFailed@" + remote + ": ClientAddress[" + remoteAddress + "] MessageClass[" + (if (request ne null) request.getClass.getName else "no message") + - "] Error[" + AkkaException.toStringWithStackTrace(cause) + "]" + "] Error[" + cause + "]" } /** @@ -203,7 +203,7 @@ abstract class RemoteTransport { protected[akka] def notifyListeners(message: RemoteLifeCycleEvent): Unit = { system.eventStream.publish(message) - system.log.log(message.logLevel, "REMOTE: {}", message) + system.log.log(message.logLevel, "{}", message) } override def toString = address.toString From 5ffe186af0d66ed2562250c53d3aeb2c572e8d3d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sat, 18 Feb 2012 17:45:21 +0100 Subject: [PATCH 21/72] Added testkit time ratio sensitive durations. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/test/scala/akka/cluster/NodeMembershipSpec.scala | 9 +++++---- .../src/test/scala/akka/cluster/NodeStartupSpec.scala | 5 +++-- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index f25a7d60a7..a2106fc6da 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -1,5 +1,5 @@ /** - * Copyright (C) 2009-2011 Typesafe Inc. + * Copyright (C) 2009-2012 Typesafe Inc. */ package akka.cluster @@ -9,12 +9,13 @@ import akka.testkit._ import akka.dispatch._ import akka.actor._ import akka.remote._ +import akka.util.duration._ import com.typesafe.config._ class NodeMembershipSpec extends AkkaSpec(""" akka { - loglevel = "DEBUG" + loglevel = "INFO" } """) with ImplicitSender { @@ -58,7 +59,7 @@ class NodeMembershipSpec extends AkkaSpec(""" val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] gossiper1 = Gossiper(node1, remote1) - Thread.sleep(5000) + Thread.sleep(10.seconds.dilated.toMillis) val members0 = gossiper0.latestGossip.members.toArray members0.size must be(2) @@ -91,7 +92,7 @@ class NodeMembershipSpec extends AkkaSpec(""" val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] gossiper2 = Gossiper(node2, remote2) - Thread.sleep(10000) + Thread.sleep(10.seconds.dilated.toMillis) val members0 = gossiper0.latestGossip.members.toArray val version = gossiper0.latestGossip.version diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index 6ccb8491c1..9066f6eaae 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -9,6 +9,7 @@ import akka.testkit._ import akka.dispatch._ import akka.actor._ import akka.remote._ +import akka.util.duration._ import com.typesafe.config._ @@ -40,7 +41,7 @@ class NodeStartupSpec extends AkkaSpec(""" gossiper0 = Gossiper(node0, remote0) "be a singleton cluster when started up" in { - Thread.sleep(1000) + Thread.sleep(1.seconds.dilated.toMillis) gossiper0.isSingletonCluster must be(true) } @@ -69,7 +70,7 @@ class NodeStartupSpec extends AkkaSpec(""" val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] gossiper1 = Gossiper(node1, remote1) - Thread.sleep(1000) // give enough time for node1 to JOIN node0 + Thread.sleep(1.seconds.dilated.toMillis) // give enough time for node1 to JOIN node0 val members = gossiper0.latestGossip.members val joiningMember = members find (_.address.port.get == 5551) joiningMember must be('defined) From 2313fc91ff0a603254181540949e95f1bf198bd5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sat, 18 Feb 2012 17:48:07 +0100 Subject: [PATCH 22/72] Fixed remaining issues with gossip based failure detection and removal of unreachable nodes. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Completed gossip based failure detection. * Completed removal of unreachable nodes according to failure detector. * Added passing tests. * Misc other fixes, more logging, more comments. Signed-off-by: Jonas Bonér --- .../akka/cluster/AccrualFailureDetector.scala | 70 +++--- .../main/scala/akka/cluster/Gossiper.scala | 23 +- .../cluster/AccrualFailureDetectorSpec.scala | 10 +- .../GossipingAccrualFailureDetectorSpec.scala | 199 ++++++++++-------- 4 files changed, 173 insertions(+), 129 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala index 8ee9f857a0..e0d7cae052 100644 --- a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala +++ b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala @@ -23,14 +23,17 @@ import System.{ currentTimeMillis ⇒ newTimestamp } *

* Default threshold is 8, but can be configured in the Akka config. */ -class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val maxSampleSize: Int = 1000) { +class AccrualFailureDetector(system: ActorSystem, address: Address, val threshold: Int = 8, val maxSampleSize: Int = 1000) { private final val PhiFactor = 1.0 / math.log(10.0) - private case class FailureStats(mean: Double = 0.0D, variance: Double = 0.0D, deviation: Double = 0.0D) - private val log = Logging(system, "FailureDetector") + /** + * Holds the failure statistics for a specific node Address. + */ + private case class FailureStats(mean: Double = 0.0D, variance: Double = 0.0D, deviation: Double = 0.0D) + /** * Implement using optimistic lockless concurrency, all state is represented * by this immutable case class and managed by an AtomicReference. @@ -54,22 +57,26 @@ class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val ma */ @tailrec final def heartbeat(connection: Address) { - log.debug("Heartbeat from connection [{}] ", connection) - val oldState = state.get + log.debug("Node [{}] - Heartbeat from connection [{}] ", address, connection) + val oldState = state.get + val oldFailureStats = oldState.failureStats + val oldTimestamps = oldState.timestamps val latestTimestamp = oldState.timestamps.get(connection) + if (latestTimestamp.isEmpty) { // this is heartbeat from a new connection // add starter records for this new connection - val failureStats = oldState.failureStats + (connection -> FailureStats()) - val intervalHistory = oldState.intervalHistory + (connection -> Vector.empty[Long]) - val timestamps = oldState.timestamps + (connection -> newTimestamp) + val newFailureStats = oldFailureStats + (connection -> FailureStats()) + val newIntervalHistory = oldState.intervalHistory + (connection -> Vector.empty[Long]) + val newTimestamps = oldTimestamps + (connection -> newTimestamp) - val newState = oldState copy (version = oldState.version + 1, - failureStats = failureStats, - intervalHistory = intervalHistory, - timestamps = timestamps) + val newState = oldState copy ( + version = oldState.version + 1, + failureStats = newFailureStats, + intervalHistory = newIntervalHistory, + timestamps = newTimestamps) // if we won the race then update else try again if (!state.compareAndSet(oldState, newState)) heartbeat(connection) // recur @@ -79,7 +86,7 @@ class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val ma val timestamp = newTimestamp val interval = timestamp - latestTimestamp.get - val timestamps = oldState.timestamps + (connection -> timestamp) // record new timestamp + val newTimestamps = oldTimestamps + (connection -> timestamp) // record new timestamp var newIntervalsForConnection = oldState.intervalHistory.get(connection).getOrElse(Vector.empty[Long]) :+ interval // append the new interval to history @@ -89,36 +96,33 @@ class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val ma newIntervalsForConnection = newIntervalsForConnection drop 0 } - val failureStats = + val newFailureStats = if (newIntervalsForConnection.size > 1) { - val mean: Double = newIntervalsForConnection.sum / newIntervalsForConnection.size.toDouble - - val oldFailureStats = oldState.failureStats.get(connection).getOrElse(FailureStats()) + val newMean: Double = newIntervalsForConnection.sum / newIntervalsForConnection.size.toDouble + val oldConnectionFailureStats = oldFailureStats.get(connection).getOrElse(throw new IllegalStateException("Can't calculate new failure statistics due to missing heartbeat history")) val deviationSum = newIntervalsForConnection .map(_.toDouble) - .foldLeft(0.0D)((x, y) ⇒ x + (y - mean)) + .foldLeft(0.0D)((x, y) ⇒ x + (y - newMean)) - val variance: Double = deviationSum / newIntervalsForConnection.size.toDouble - val deviation: Double = math.sqrt(variance) + val newVariance: Double = deviationSum / newIntervalsForConnection.size.toDouble + val newDeviation: Double = math.sqrt(newVariance) - val newFailureStats = oldFailureStats copy (mean = mean, - deviation = deviation, - variance = variance) + val newFailureStats = oldConnectionFailureStats copy (mean = newMean, deviation = newDeviation, variance = newVariance) + oldFailureStats + (connection -> newFailureStats) - oldState.failureStats + (connection -> newFailureStats) } else { - oldState.failureStats + oldFailureStats } - val intervalHistory = oldState.intervalHistory + (connection -> newIntervalsForConnection) + val newIntervalHistory = oldState.intervalHistory + (connection -> newIntervalsForConnection) val newState = oldState copy (version = oldState.version + 1, - failureStats = failureStats, - intervalHistory = intervalHistory, - timestamps = timestamps) + failureStats = newFailureStats, + intervalHistory = newIntervalHistory, + timestamps = newTimestamps) // if we won the race then update else try again if (!state.compareAndSet(oldState, newState)) heartbeat(connection) // recur @@ -138,17 +142,21 @@ class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val ma def phi(connection: Address): Double = { val oldState = state.get val oldTimestamp = oldState.timestamps.get(connection) + val phi = if (oldTimestamp.isEmpty) 0.0D // treat unmanaged connections, e.g. with zero heartbeats, as healthy connections else { val timestampDiff = newTimestamp - oldTimestamp.get - val mean = oldState.failureStats.get(connection).getOrElse(FailureStats()).mean + + val stats = oldState.failureStats.get(connection) + val mean = stats.getOrElse(throw new IllegalStateException("Can't calculate Failure Detector Phi value for a node that have no heartbeat history")).mean + if (mean == 0.0D) 0.0D else PhiFactor * timestampDiff / mean } // only log if PHI value is starting to get interesting - if (phi > 0.0D) log.debug("Phi value [{}] and threshold [{}] for connection [{}] ", phi, threshold, connection) + if (phi > 0.0D) log.debug("Node [{}] - Phi value [{}] and threshold [{}] for connection [{}] ", address, phi, threshold, connection) phi } diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index bb1e19e746..73575efec7 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -210,11 +210,12 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) + val failureDetector = new AccrualFailureDetector( + system, remoteAddress, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) + private val nodeToJoin: Option[Address] = clusterSettings.NodeToJoin filter (_ != remoteAddress) private val serialization = remote.serialization - private val failureDetector = new AccrualFailureDetector( - system, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) private val isRunning = new AtomicBoolean(true) private val log = Logging(system, "Gossiper") @@ -279,12 +280,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (isRunning.compareAndSet(true, false)) { log.info("Node [{}] - Shutting down Gossiper and ClusterDaemon...", remoteAddress) - try connectionManager.shutdown() finally { - try system.stop(clusterDaemon) finally { - try gossipCanceller.cancel() finally { - try scrutinizeCanceller.cancel() finally { - log.info("Node [{}] - Gossiper and ClusterDaemon shut down successfully", remoteAddress) - } + try system.stop(clusterDaemon) finally { + try gossipCanceller.cancel() finally { + try scrutinizeCanceller.cancel() finally { + log.info("Node [{}] - Gossiper and ClusterDaemon shut down successfully", remoteAddress) } } } @@ -298,6 +297,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { final def joining(node: Address) { log.info("Node [{}] - Node [{}] is joining", remoteAddress, node) + failureDetector heartbeat node // update heartbeat in failure detector + val localState = state.get val localGossip = localState.latestGossip val localMembers = localGossip.members @@ -475,7 +476,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ private def gossipTo(address: Address) { setUpConnectionTo(address) foreach { connection ⇒ - log.debug("Node [{}] - Gossiping to [{}]", remoteAddress, address) + log.debug("Node [{}] - Gossiping to [{}]", remoteAddress, connection) connection ! GossipEnvelope(self, latestGossip) } } @@ -496,7 +497,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } /** - * Scrutinizes the cluster; marks members detected by the failure detector as unavailable. + * Scrutinizes the cluster; marks members detected by the failure detector as unreachable. */ @tailrec final private def scrutinize() { @@ -517,6 +518,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val newMembers = localMembers diff newlyDetectedUnreachableMembers val newUnreachableAddresses: Set[Address] = localUnreachableAddresses ++ newlyDetectedUnreachableAddresses + log.info("Node [{}] - Marking node(s) an unreachable [{}]", remoteAddress, newlyDetectedUnreachableAddresses.mkString(", ")) + val newOverview = localOverview copy (unreachable = newUnreachableAddresses) val newGossip = localGossip copy (overview = newOverview, members = newMembers) diff --git a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala index 034f582e0d..2e00c72ad1 100644 --- a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala @@ -13,13 +13,13 @@ class AccrualFailureDetectorSpec extends AkkaSpec(""" val conn2 = Address("akka", "", "localhost", 2553) "return phi value of 0.0D on startup for each address" in { - val fd = new AccrualFailureDetector(system) + val fd = new AccrualFailureDetector(system, conn) fd.phi(conn) must be(0.0D) fd.phi(conn2) must be(0.0D) } "mark node as available after a series of successful heartbeats" in { - val fd = new AccrualFailureDetector(system) + val fd = new AccrualFailureDetector(system, conn) fd.heartbeat(conn) @@ -34,7 +34,7 @@ class AccrualFailureDetectorSpec extends AkkaSpec(""" // FIXME how should we deal with explicit removal of connection? - if triggered as failure then we have a problem in boostrap - see line 142 in AccrualFailureDetector "mark node as dead after explicit removal of connection" ignore { - val fd = new AccrualFailureDetector(system) + val fd = new AccrualFailureDetector(system, conn) fd.heartbeat(conn) @@ -52,7 +52,7 @@ class AccrualFailureDetectorSpec extends AkkaSpec(""" } "mark node as dead if heartbeat are missed" in { - val fd = new AccrualFailureDetector(system, threshold = 3) + val fd = new AccrualFailureDetector(system, conn, threshold = 3) fd.heartbeat(conn) @@ -70,7 +70,7 @@ class AccrualFailureDetectorSpec extends AkkaSpec(""" } "mark node as available if it starts heartbeat again after being marked dead due to detection of failure" in { - val fd = new AccrualFailureDetector(system, threshold = 3) + val fd = new AccrualFailureDetector(system, conn, threshold = 3) fd.heartbeat(conn) diff --git a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala index 6366a9f65e..413ab7e537 100644 --- a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala @@ -1,95 +1,128 @@ -// /** -// * Copyright (C) 2009-2011 Typesafe Inc. -// */ -// package akka.cluster +/** + * Copyright (C) 2009-2012 Typesafe Inc. + */ +package akka.cluster -// import java.net.InetSocketAddress +import akka.testkit._ +import akka.dispatch._ +import akka.actor._ +import akka.remote._ +import akka.util.duration._ -// import akka.testkit._ -// import akka.dispatch._ -// import akka.actor._ -// import com.typesafe.config._ +import com.typesafe.config._ -// class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" -// akka { -// loglevel = "INFO" -// actor.provider = "akka.remote.RemoteActorRefProvider" +import java.net.InetSocketAddress -// remote.server.hostname = localhost -// remote.server.port = 5550 -// remote.failure-detector.threshold = 3 -// cluster.seed-nodes = ["akka://localhost:5551"] -// } -// """) with ImplicitSender { +class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" + akka { + loglevel = "INFO" + cluster.failure-detector.threshold = 3 + actor.debug.lifecycle = on + actor.debug.autoreceive = on + } + """) with ImplicitSender { -// val conn1 = Address("akka", system.systemName, Some("localhost"), Some(5551)) -// val node1 = ActorSystem("GossiperSpec", ConfigFactory -// .parseString("akka { remote.server.port=5551, cluster.use-cluster = on }") -// .withFallback(system.settings.config)) -// val remote1 = -// node1.asInstanceOf[ActorSystemImpl] -// .provider.asInstanceOf[RemoteActorRefProvider] -// .remote -// val gossiper1 = remote1.gossiper -// val fd1 = remote1.failureDetector -// gossiper1 must be('defined) + var gossiper1: Gossiper = _ + var gossiper2: Gossiper = _ + var gossiper3: Gossiper = _ -// val conn2 = RemoteNettyAddress("localhost", 5552) -// val node2 = ActorSystem("GossiperSpec", ConfigFactory -// .parseString("akka { remote.server.port=5552, cluster.use-cluster = on }") -// .withFallback(system.settings.config)) -// val remote2 = -// node2.asInstanceOf[ActorSystemImpl] -// .provider.asInstanceOf[RemoteActorRefProvider] -// .remote -// val gossiper2 = remote2.gossiper -// val fd2 = remote2.failureDetector -// gossiper2 must be('defined) + var node1: ActorSystemImpl = _ + var node2: ActorSystemImpl = _ + var node3: ActorSystemImpl = _ -// val conn3 = RemoteNettyAddress("localhost", 5553) -// val node3 = ActorSystem("GossiperSpec", ConfigFactory -// .parseString("akka { remote.server.port=5553, cluster.use-cluster = on }") -// .withFallback(system.settings.config)) -// val remote3 = -// node3.asInstanceOf[ActorSystemImpl] -// .provider.asInstanceOf[RemoteActorRefProvider] -// .remote -// val gossiper3 = remote3.gossiper -// val fd3 = remote3.failureDetector -// gossiper3 must be('defined) + try { + "A Gossip-driven Failure Detector" must { -// "A Gossip-driven Failure Detector" must { + // ======= NODE 1 ======== + node1 = ActorSystem("node1", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] + gossiper1 = Gossiper(node1, remote1) + val fd1 = gossiper1.failureDetector + val address1 = gossiper1.self.address -// "receive gossip heartbeats so that all healthy nodes in the cluster are marked 'available'" ignore { -// Thread.sleep(5000) // let them gossip for 10 seconds -// fd1.isAvailable(conn2) must be(true) -// fd1.isAvailable(conn3) must be(true) -// fd2.isAvailable(conn1) must be(true) -// fd2.isAvailable(conn3) must be(true) -// fd3.isAvailable(conn1) must be(true) -// fd3.isAvailable(conn2) must be(true) -// } + // ======= NODE 2 ======== + node2 = ActorSystem("node2", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port = 5551 + } + cluster.node-to-join = "akka://node1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] + gossiper2 = Gossiper(node2, remote2) + val fd2 = gossiper2.failureDetector + val address2 = gossiper2.self.address -// "mark node as 'unavailable' if a node in the cluster is shut down and its heartbeats stops" ignore { -// // kill node 3 -// gossiper3.get.shutdown() -// node3.shutdown() -// Thread.sleep(5000) // let them gossip for 10 seconds + // ======= NODE 3 ======== + node3 = ActorSystem("node3", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5552 + } + cluster.node-to-join = "akka://node1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote3 = node3.provider.asInstanceOf[RemoteActorRefProvider] + gossiper3 = Gossiper(node3, remote3) + val fd3 = gossiper3.failureDetector + val address3 = gossiper3.self.address -// fd1.isAvailable(conn2) must be(true) -// fd1.isAvailable(conn3) must be(false) -// fd2.isAvailable(conn1) must be(true) -// fd2.isAvailable(conn3) must be(false) -// } -// } + "receive gossip heartbeats so that all healthy nodes in the cluster are marked 'available'" in { + println("Let the nodes gossip for a while...") + Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds + fd1.isAvailable(address2) must be(true) + fd1.isAvailable(address3) must be(true) + fd2.isAvailable(address1) must be(true) + fd2.isAvailable(address3) must be(true) + fd3.isAvailable(address1) must be(true) + fd3.isAvailable(address2) must be(true) + } -// override def atTermination() { -// gossiper1.get.shutdown() -// gossiper2.get.shutdown() -// gossiper3.get.shutdown() -// node1.shutdown() -// node2.shutdown() -// node3.shutdown() -// // FIXME Ordering problem - If we shut down the ActorSystem before the Gossiper then we get an IllegalStateException -// } -// } + "mark node as 'unavailable' if a node in the cluster is shut down (and its heartbeats stops)" in { + // shut down node3 + gossiper3.shutdown() + node3.shutdown() + println("Give the remaning nodes time to detect failure...") + Thread.sleep(30.seconds.dilated.toMillis) // give them 30 seconds to detect failure of node3 + fd1.isAvailable(address2) must be(true) + fd1.isAvailable(address3) must be(false) + fd2.isAvailable(address1) must be(true) + fd2.isAvailable(address3) must be(false) + } + } + } catch { + case e: Exception ⇒ + e.printStackTrace + fail(e.toString) + } + + override def atTermination() { + gossiper1.shutdown() + node1.shutdown() + + gossiper2.shutdown() + node2.shutdown() + + gossiper3.shutdown() + node3.shutdown() + } +} From 8d48c1eed9307055d3b50634fdc5c59364b0810b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sat, 18 Feb 2012 22:14:53 +0100 Subject: [PATCH 23/72] Added support for checking for Cluster Convergence and completed support for MembershipChangeListener (including tests). MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/Gossiper.scala | 49 ++++-- .../MembershipChangeListenerSpec.scala | 144 ++++++++++++++++++ .../akka/cluster/NodeMembershipSpec.scala | 24 ++- 3 files changed, 200 insertions(+), 17 deletions(-) create mode 100644 akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 73575efec7..313f052e2a 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -26,10 +26,8 @@ import com.google.protobuf.ByteString /** * Interface for membership change listener. */ -trait MembershipChangeListener { // FIXME add notification of MembershipChangeListener +trait MembershipChangeListener { def notify(members: SortedSet[Member]): Unit - // def memberConnected(member: Member): Unit - // def memberDisconnected(member: Member): Unit } /** @@ -312,6 +310,11 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val newState = localState copy (latestGossip = seenVersionedGossip) if (!state.compareAndSet(localState, newState)) joining(node) // recur if we failed update + else { + if (convergence(newState.latestGossip).isDefined) { + newState.memberMembershipChangeListeners map { _ notify newMembers } // FIXME should check for cluster convergence before triggering listeners + } + } } /** @@ -323,8 +326,6 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { failureDetector heartbeat sender.address // update heartbeat in failure detector - // FIXME check for convergence - if we have convergence then trigger the listeners - val localState = state.get val localGossip = localState.latestGossip @@ -334,7 +335,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val mergedGossip = merge(remoteGossip, localGossip) val versionedMergedGossip = mergedGossip + selfNode - log.debug("Can't establish a causal relationship between \"remote\" gossip [{}] and \"local\" gossip [{}] - merging them into [{}]", + log.debug( + "Can't establish a causal relationship between \"remote\" gossip [{}] and \"local\" gossip [{}] - merging them into [{}]", remoteGossip, localGossip, versionedMergedGossip) versionedMergedGossip @@ -352,6 +354,11 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // if we won the race then update else try again if (!state.compareAndSet(localState, newState)) receive(sender, remoteGossip) // recur if we fail the update + else { + if (convergence(newState.latestGossip).isDefined) { + newState.memberMembershipChangeListeners map { _ notify newState.latestGossip.members } // FIXME should check for cluster convergence before triggering listeners + } + } } /** @@ -376,6 +383,13 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (!state.compareAndSet(localState, newState)) unregisterListener(listener) // recur } + /** + * Checks if we have a cluster convergence. + * + * @returns Some(convergedGossip) if convergence have been reached and None if not + */ + def convergence: Option[Gossip] = convergence(latestGossip) + // ======================================================== // ===================== INTERNAL API ===================== // ======================================================== @@ -531,17 +545,28 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // if we won the race then update else try again if (!state.compareAndSet(localState, newState)) scrutinize() // recur else { - // FIXME should only notify when there is a cluster convergence - // notify listeners on successful update of state - // for { - // deadNode ← newUnreachableAddresses - // listener ← localState.memberMembershipChangeListeners - // } listener memberDisconnected deadNode + if (convergence(newState.latestGossip).isDefined) { + newState.memberMembershipChangeListeners map { _ notify newMembers } // FIXME should check for cluster convergence before triggering listeners + } } } } } + /** + * Checks if we have a cluster convergence. + * + * @returns Some(convergedGossip) if convergence have been reached and None if not + */ + private def convergence(gossip: Gossip): Option[Gossip] = { + val seen = gossip.overview.seen + val views = Set.empty[VectorClock] ++ seen.values + if (views.size == 1) { + log.debug("Node [{}] - Cluster convergence reached", remoteAddress) + Some(gossip) + } else None + } + // FIXME should shuffle list randomly before start traversing to avoid connecting to some member on every member @tailrec final private def connectToRandomNodeOf(addresses: Seq[Address]): ActorRef = { diff --git a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala new file mode 100644 index 0000000000..74de663697 --- /dev/null +++ b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala @@ -0,0 +1,144 @@ +/** + * Copyright (C) 2009-2012 Typesafe Inc. + */ +package akka.cluster + +import akka.testkit._ +import akka.dispatch._ +import akka.actor._ +import akka.remote._ +import akka.util.duration._ + +import java.net.InetSocketAddress +import java.util.concurrent.{ CountDownLatch, TimeUnit } + +import scala.collection.immutable.SortedSet + +import com.typesafe.config._ + +class MembershipChangeListenerSpec extends AkkaSpec(""" + akka { + loglevel = "INFO" + } + """) with ImplicitSender { + + var gossiper0: Gossiper = _ + var gossiper1: Gossiper = _ + var gossiper2: Gossiper = _ + + var node0: ActorSystemImpl = _ + var node1: ActorSystemImpl = _ + var node2: ActorSystemImpl = _ + + try { + "A set of connected cluster nodes" must { + "(when two nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" in { + node0 = ActorSystem("node0", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] + gossiper0 = Gossiper(node0, remote0) + + node1 = ActorSystem("node1", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5551 + } + cluster.node-to-join = "akka://node0@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] + gossiper1 = Gossiper(node1, remote1) + + val latch = new CountDownLatch(2) + + gossiper0.registerListener(new MembershipChangeListener { + def notify(members: SortedSet[Member]) { + latch.countDown() + } + }) + gossiper1.registerListener(new MembershipChangeListener { + def notify(members: SortedSet[Member]) { + latch.countDown() + } + }) + + latch.await(10.seconds.dilated.toMillis, TimeUnit.MILLISECONDS) + + // check cluster convergence + gossiper0.convergence must be('defined) + gossiper1.convergence must be('defined) + } + + "(when three nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" in { + + // ======= NODE 2 ======== + node2 = ActorSystem("node2", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5552 + } + cluster.node-to-join = "akka://node0@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] + gossiper2 = Gossiper(node2, remote2) + + val latch = new CountDownLatch(3) + gossiper0.registerListener(new MembershipChangeListener { + def notify(members: SortedSet[Member]) { + latch.countDown() + } + }) + gossiper1.registerListener(new MembershipChangeListener { + def notify(members: SortedSet[Member]) { + latch.countDown() + } + }) + gossiper2.registerListener(new MembershipChangeListener { + def notify(members: SortedSet[Member]) { + latch.countDown() + } + }) + + latch.await(10.seconds.dilated.toMillis, TimeUnit.MILLISECONDS) + + // check cluster convergence + gossiper0.convergence must be('defined) + gossiper1.convergence must be('defined) + gossiper2.convergence must be('defined) + } + } + } catch { + case e: Exception ⇒ + e.printStackTrace + fail(e.toString) + } + + override def atTermination() { + gossiper0.shutdown() + node0.shutdown() + + gossiper1.shutdown() + node1.shutdown() + + gossiper2.shutdown() + node2.shutdown() + } +} diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index a2106fc6da..5fc062f517 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -30,7 +30,9 @@ class NodeMembershipSpec extends AkkaSpec(""" try { "A set of connected cluster nodes" must { "(when two nodes) start gossiping to each other so that both nodes gets the same gossip info" in { - node0 = ActorSystem("NodeMembershipSpec", ConfigFactory + + // ======= NODE 0 ======== + node0 = ActorSystem("node0", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -44,7 +46,8 @@ class NodeMembershipSpec extends AkkaSpec(""" val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] gossiper0 = Gossiper(node0, remote0) - node1 = ActorSystem("NodeMembershipSpec", ConfigFactory + // ======= NODE 1 ======== + node1 = ActorSystem("node1", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -52,7 +55,7 @@ class NodeMembershipSpec extends AkkaSpec(""" hostname = localhost port=5551 } - cluster.node-to-join = "akka://NodeMembershipSpec@localhost:5550" + cluster.node-to-join = "akka://node0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] @@ -61,6 +64,10 @@ class NodeMembershipSpec extends AkkaSpec(""" Thread.sleep(10.seconds.dilated.toMillis) + // check cluster convergence + gossiper0.convergence must be('defined) + gossiper1.convergence must be('defined) + val members0 = gossiper0.latestGossip.members.toArray members0.size must be(2) members0(0).address.port.get must be(5550) @@ -77,7 +84,9 @@ class NodeMembershipSpec extends AkkaSpec(""" } "(when three nodes) start gossiping to each other so that both nodes gets the same gossip info" in { - node2 = ActorSystem("NodeMembershipSpec", ConfigFactory + + // ======= NODE 2 ======== + node2 = ActorSystem("node2", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -85,7 +94,7 @@ class NodeMembershipSpec extends AkkaSpec(""" hostname = localhost port=5552 } - cluster.node-to-join = "akka://NodeMembershipSpec@localhost:5550" + cluster.node-to-join = "akka://node0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] @@ -94,6 +103,11 @@ class NodeMembershipSpec extends AkkaSpec(""" Thread.sleep(10.seconds.dilated.toMillis) + // check cluster convergence + gossiper0.convergence must be('defined) + gossiper1.convergence must be('defined) + gossiper2.convergence must be('defined) + val members0 = gossiper0.latestGossip.members.toArray val version = gossiper0.latestGossip.version members0.size must be(3) From 2de4c3b6b8430ac39d14cc24ded952e36c83124f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sun, 19 Feb 2012 21:18:16 +0100 Subject: [PATCH 24/72] Created test tag LongRunningTest ("long-running") for excluding long running (cluster) tests from standard suite. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../GossipingAccrualFailureDetectorSpec.scala | 16 ++++++++-------- .../cluster/MembershipChangeListenerSpec.scala | 16 ++++++++-------- .../scala/akka/cluster/NodeMembershipSpec.scala | 16 ++++++++-------- .../scala/akka/cluster/NodeStartupSpec.scala | 8 ++++---- .../src/test/scala/akka/testkit/AkkaSpec.scala | 1 + project/AkkaBuild.scala | 2 +- 6 files changed, 30 insertions(+), 29 deletions(-) diff --git a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala index 413ab7e537..8939c4d728 100644 --- a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala @@ -86,7 +86,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" val fd3 = gossiper3.failureDetector val address3 = gossiper3.self.address - "receive gossip heartbeats so that all healthy nodes in the cluster are marked 'available'" in { + "receive gossip heartbeats so that all healthy nodes in the cluster are marked 'available'" taggedAs LongRunningTest in { println("Let the nodes gossip for a while...") Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds fd1.isAvailable(address2) must be(true) @@ -97,7 +97,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" fd3.isAvailable(address2) must be(true) } - "mark node as 'unavailable' if a node in the cluster is shut down (and its heartbeats stops)" in { + "mark node as 'unavailable' if a node in the cluster is shut down (and its heartbeats stops)" taggedAs LongRunningTest in { // shut down node3 gossiper3.shutdown() node3.shutdown() @@ -116,13 +116,13 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" } override def atTermination() { - gossiper1.shutdown() - node1.shutdown() + if (gossiper1 ne null) gossiper1.shutdown() + if (node1 ne null) node1.shutdown() - gossiper2.shutdown() - node2.shutdown() + if (gossiper2 ne null) gossiper2.shutdown() + if (node2 ne null) node2.shutdown() - gossiper3.shutdown() - node3.shutdown() + if (gossiper3 ne null) gossiper3.shutdown() + if (node3 ne null) node3.shutdown() } } diff --git a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala index 74de663697..e168b7caee 100644 --- a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala @@ -32,7 +32,7 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" try { "A set of connected cluster nodes" must { - "(when two nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" in { + "(when two nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { node0 = ActorSystem("node0", ConfigFactory .parseString(""" akka { @@ -82,7 +82,7 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" gossiper1.convergence must be('defined) } - "(when three nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" in { + "(when three nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { // ======= NODE 2 ======== node2 = ActorSystem("node2", ConfigFactory @@ -132,13 +132,13 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" } override def atTermination() { - gossiper0.shutdown() - node0.shutdown() + if (gossiper0 ne null) gossiper0.shutdown() + if (node0 ne null) node0.shutdown() - gossiper1.shutdown() - node1.shutdown() + if (gossiper1 ne null) gossiper1.shutdown() + if (node1 ne null) node1.shutdown() - gossiper2.shutdown() - node2.shutdown() + if (gossiper2 ne null) gossiper2.shutdown() + if (node2 ne null) node2.shutdown() } } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index 5fc062f517..2ce0a1d449 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -29,7 +29,7 @@ class NodeMembershipSpec extends AkkaSpec(""" try { "A set of connected cluster nodes" must { - "(when two nodes) start gossiping to each other so that both nodes gets the same gossip info" in { + "(when two nodes) start gossiping to each other so that both nodes gets the same gossip info" taggedAs LongRunningTest in { // ======= NODE 0 ======== node0 = ActorSystem("node0", ConfigFactory @@ -83,7 +83,7 @@ class NodeMembershipSpec extends AkkaSpec(""" members1(1).status must be(MemberStatus.Joining) } - "(when three nodes) start gossiping to each other so that both nodes gets the same gossip info" in { + "(when three nodes) start gossiping to each other so that both nodes gets the same gossip info" taggedAs LongRunningTest in { // ======= NODE 2 ======== node2 = ActorSystem("node2", ConfigFactory @@ -144,13 +144,13 @@ class NodeMembershipSpec extends AkkaSpec(""" } override def atTermination() { - gossiper0.shutdown() - node0.shutdown() + if (gossiper0 ne null) gossiper0.shutdown() + if (node0 ne null) node0.shutdown() - gossiper1.shutdown() - node1.shutdown() + if (gossiper1 ne null) gossiper1.shutdown() + if (node1 ne null) node1.shutdown() - gossiper2.shutdown() - node2.shutdown() + if (gossiper2 ne null) gossiper2.shutdown() + if (node2 ne null) node2.shutdown() } } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index 9066f6eaae..b3805b7946 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -84,10 +84,10 @@ class NodeStartupSpec extends AkkaSpec(""" } override def atTermination() { - gossiper0.shutdown() - node0.shutdown() + if (gossiper0 ne null) gossiper0.shutdown() + if (node0 ne null) node0.shutdown() - gossiper1.shutdown() - node1.shutdown() + if (gossiper1 ne null) gossiper1.shutdown() + if (node1 ne null) node1.shutdown() } } diff --git a/akka-testkit/src/test/scala/akka/testkit/AkkaSpec.scala b/akka-testkit/src/test/scala/akka/testkit/AkkaSpec.scala index 95ce267320..3a0f02c79a 100644 --- a/akka-testkit/src/test/scala/akka/testkit/AkkaSpec.scala +++ b/akka-testkit/src/test/scala/akka/testkit/AkkaSpec.scala @@ -20,6 +20,7 @@ import akka.dispatch.Dispatchers import akka.pattern.ask object TimingTest extends Tag("timing") +object LongRunningTest extends Tag("long-running") object AkkaSpec { val testConf: Config = ConfigFactory.parseString(""" diff --git a/project/AkkaBuild.scala b/project/AkkaBuild.scala index 905e345dec..f543aac783 100644 --- a/project/AkkaBuild.scala +++ b/project/AkkaBuild.scala @@ -343,7 +343,7 @@ object AkkaBuild extends Build { val excludeTestTags = SettingKey[Seq[String]]("exclude-test-tags") val includeTestTags = SettingKey[Seq[String]]("include-test-tags") - val defaultExcludedTags = Seq("timing") + val defaultExcludedTags = Seq("timing", "long-running") lazy val defaultSettings = baseSettings ++ formatSettings ++ Seq( resolvers += "Typesafe Repo" at "http://repo.typesafe.com/typesafe/releases/", From 663ca721b55a90fb43928bc8898283fea934c20f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Mon, 20 Feb 2012 15:26:12 +0100 Subject: [PATCH 25/72] Split up ClusterDaemon into ClusterGossipDaemon (routed with configurable N instances) and ClusterCommandDaemon (shortly to be an FSM). Removed ConnectionManager crap. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/resources/reference.conf | 3 + .../scala/akka/cluster/ClusterSettings.scala | 1 + .../main/scala/akka/cluster/Gossiper.scala | 103 +++++++++--------- .../akka/cluster/ClusterConfigSpec.scala | 1 + .../MembershipChangeListenerSpec.scala | 4 + 5 files changed, 59 insertions(+), 53 deletions(-) diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index 4d8c4a5e32..871026a275 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -12,6 +12,9 @@ akka { # leave as empty string if the node should be a singleton cluster node-to-join = "" + # the number of gossip daemon actors + nr-of-gossip-daemons = 4 + gossip { initialDelay = 5s frequency = 1s diff --git a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala index be3205148b..c9cb9ede4a 100644 --- a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala +++ b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala @@ -21,4 +21,5 @@ class ClusterSettings(val config: Config, val systemName: String) { } val GossipInitialDelay = Duration(getMilliseconds("akka.cluster.gossip.initialDelay"), MILLISECONDS) val GossipFrequency = Duration(getMilliseconds("akka.cluster.gossip.frequency"), MILLISECONDS) + val NrOfGossipDaemons = getInt("akka.cluster.nr-of-gossip-daemons") } diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 313f052e2a..b3e7df27bf 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -7,6 +7,7 @@ package akka.cluster import akka.actor._ import akka.actor.Status._ import akka.remote._ +import akka.routing._ import akka.event.Logging import akka.dispatch.Await import akka.pattern.ask @@ -119,7 +120,7 @@ case class GossipOverview( case class Gossip( overview: GossipOverview = GossipOverview(), members: SortedSet[Member], // sorted set of members with their status, sorted by name - //partitions: Tree[PartitionPath, Node] = Tree.empty[PartitionPath, Node], + //partitions: Tree[PartitionPath, Node] = Tree.empty[PartitionPath, Node], // name/partition service //pending: Set[PartitioningChange] = Set.empty[PartitioningChange], meta: Map[String, Array[Byte]] = Map.empty[String, Array[Byte]], version: VectorClock = VectorClock()) // vector clock version @@ -152,16 +153,32 @@ case class Gossip( ")" } -// FIXME add FSM trait? -final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { - val log = Logging(system, "ClusterDaemon") +// FIXME ClusterCommandDaemon with FSM trait +/** + * Single instance. FSM managing the different cluster nodes states. + * Serialized access to Gossiper. + */ +final class ClusterCommandDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { + val log = Logging(system, "ClusterCommandDaemon") + + def receive = { + case Join(address) ⇒ gossiper.joining(address) + case Leave(address) ⇒ //gossiper.leaving(address) + case Down(address) ⇒ //gossiper.downing(address) + case Remove(address) ⇒ //gossiper.removing(address) + case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") + } +} + +/** + * Pooled and routed wit N number of configurable instances. + * Concurrent access to Gossiper. + */ +final class ClusterGossipDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { + val log = Logging(system, "ClusterGossipDaemon") def receive = { case GossipEnvelope(sender, gossip) ⇒ gossiper.receive(sender, gossip) - case Join(address) ⇒ gossiper.joining(address) - case Leave(address) ⇒ //gossiper.leaving(address) - case Down(address) ⇒ //gossiper.downing(address) - case Remove(address) ⇒ //gossiper.removing(address) case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } @@ -211,6 +228,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val failureDetector = new AccrualFailureDetector( system, remoteAddress, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) + private val nrOfGossipDaemons = clusterSettings.NrOfGossipDaemons private val nodeToJoin: Option[Address] = clusterSettings.NodeToJoin filter (_ != remoteAddress) private val serialization = remote.serialization @@ -221,7 +239,11 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // Is it right to put this guy under the /system path or should we have a top-level /cluster or something else...? // FIXME should be defined as a router so we get concurrency here - private val clusterDaemon = system.systemActorOf(Props(new ClusterDaemon(system, this)), "cluster") + private val clusterCommandDaemon = system.systemActorOf( + Props(new ClusterCommandDaemon(system, this)), "clusterCommand") + + private val clusterGossipDaemon = system.systemActorOf( + Props(new ClusterGossipDaemon(system, this)).withRouter(RoundRobinRouter(nrOfGossipDaemons)), "clusterGossip") private val state = { val member = Member(remoteAddress, MemberStatus.Joining) @@ -229,9 +251,6 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { new AtomicReference[State](State(member, gossip)) } - // FIXME manage connections in some other way so we can delete the RemoteConnectionManager (SINCE IT SUCKS!!!) - private val connectionManager = new RemoteConnectionManager(system, remote, failureDetector, Map.empty[Address, ActorRef]) - import Versioned.latestVersionOf log.info("Node [{}] - Starting cluster Gossiper...", remoteAddress) @@ -278,10 +297,12 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (isRunning.compareAndSet(true, false)) { log.info("Node [{}] - Shutting down Gossiper and ClusterDaemon...", remoteAddress) - try system.stop(clusterDaemon) finally { - try gossipCanceller.cancel() finally { - try scrutinizeCanceller.cancel() finally { - log.info("Node [{}] - Gossiper and ClusterDaemon shut down successfully", remoteAddress) + try system.stop(clusterCommandDaemon) finally { + try system.stop(clusterGossipDaemon) finally { + try gossipCanceller.cancel() finally { + try scrutinizeCanceller.cancel() finally { + log.info("Node [{}] - Gossiper and ClusterDaemon shut down successfully", remoteAddress) + } } } } @@ -398,11 +419,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * Joins the pre-configured contact point and retrieves current gossip state. */ private def join() = nodeToJoin foreach { address ⇒ - setUpConnectionTo(address) foreach { connection ⇒ - val command = Join(remoteAddress) - log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, address, connection) - connection ! command - } + val connection = clusterCommandConnectionFor(address) + val command = Join(remoteAddress) + log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, address, connection) + connection ! command } /** @@ -489,10 +509,9 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * Gossips latest gossip to an address. */ private def gossipTo(address: Address) { - setUpConnectionTo(address) foreach { connection ⇒ - log.debug("Node [{}] - Gossiping to [{}]", remoteAddress, connection) - connection ! GossipEnvelope(self, latestGossip) - } + val connection = clusterGossipConnectionFor(address) + log.debug("Node [{}] - Gossiping to [{}]", remoteAddress, connection) + connection ! GossipEnvelope(self, latestGossip) } /** @@ -567,37 +586,15 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } else None } - // FIXME should shuffle list randomly before start traversing to avoid connecting to some member on every member - @tailrec - final private def connectToRandomNodeOf(addresses: Seq[Address]): ActorRef = { - addresses match { - - case address :: rest ⇒ - setUpConnectionTo(address) match { - case Some(connection) ⇒ connection - case None ⇒ connectToRandomNodeOf(rest) // recur - if we could not set up a connection - try next address - } - - case Nil ⇒ - throw new RemoteConnectionException( - "Could not establish connection to any of the addresses in the argument list [" + addresses.mkString(", ") + "]") - } - } + /** + * Sets up cluster command connection. + */ + private def clusterCommandConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "clusterCommand") /** - * Sets up remote connections to all the addresses in the argument list. + * Sets up cluster gossip connection. */ - private def setUpConnectionsTo(addresses: Seq[Address]): Seq[Option[ActorRef]] = addresses map setUpConnectionTo - - /** - * Sets up remote connection. - */ - private def setUpConnectionTo(address: Address): Option[ActorRef] = Option { - // FIXME no need for using a factory here - remove connectionManager - try connectionManager.putIfAbsent(address, () ⇒ system.actorFor(RootActorPath(address) / "system" / "cluster")) catch { - case e: Exception ⇒ null - } - } + private def clusterGossipConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "clusterGossip") private def deputyNodesWithoutMyself: Seq[Address] = Seq.empty[Address] filter (_ != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 78c836f0b5..2afbc7efc0 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -28,6 +28,7 @@ class ClusterConfigSpec extends AkkaSpec( NodeToJoin must be(None) GossipInitialDelay must be(5 seconds) GossipFrequency must be(1 second) + NrOfGossipDaemons must be(4) } } } diff --git a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala index e168b7caee..a82bbe4d5e 100644 --- a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala @@ -77,6 +77,8 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" latch.await(10.seconds.dilated.toMillis, TimeUnit.MILLISECONDS) + Thread.sleep(10.seconds.dilated.toMillis) + // check cluster convergence gossiper0.convergence must be('defined) gossiper1.convergence must be('defined) @@ -119,6 +121,8 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" latch.await(10.seconds.dilated.toMillis, TimeUnit.MILLISECONDS) + Thread.sleep(10.seconds.dilated.toMillis) + // check cluster convergence gossiper0.convergence must be('defined) gossiper1.convergence must be('defined) From a37c5f3c9a69304d3b380fa7cc1ad3e7aefde0ea Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Mon, 20 Feb 2012 15:45:50 +0100 Subject: [PATCH 26/72] Renamed Gossiper to Node (and selfNode to vclockNode). MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../cluster/{Gossiper.scala => Node.scala} | 44 ++++++------ .../GossipingAccrualFailureDetectorSpec.scala | 66 +++++++++--------- .../MembershipChangeListenerSpec.scala | 66 +++++++++--------- .../akka/cluster/NodeMembershipSpec.scala | 68 +++++++++---------- .../scala/akka/cluster/NodeStartupSpec.scala | 30 ++++---- 5 files changed, 137 insertions(+), 137 deletions(-) rename akka-cluster/src/main/scala/akka/cluster/{Gossiper.scala => Node.scala} (93%) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala similarity index 93% rename from akka-cluster/src/main/scala/akka/cluster/Gossiper.scala rename to akka-cluster/src/main/scala/akka/cluster/Node.scala index b3e7df27bf..0eaa6b1d16 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -156,35 +156,35 @@ case class Gossip( // FIXME ClusterCommandDaemon with FSM trait /** * Single instance. FSM managing the different cluster nodes states. - * Serialized access to Gossiper. + * Serialized access to Node. */ -final class ClusterCommandDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { +final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor { val log = Logging(system, "ClusterCommandDaemon") def receive = { - case Join(address) ⇒ gossiper.joining(address) - case Leave(address) ⇒ //gossiper.leaving(address) - case Down(address) ⇒ //gossiper.downing(address) - case Remove(address) ⇒ //gossiper.removing(address) + case Join(address) ⇒ node.joining(address) + case Leave(address) ⇒ //node.leaving(address) + case Down(address) ⇒ //node.downing(address) + case Remove(address) ⇒ //node.removing(address) case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } /** * Pooled and routed wit N number of configurable instances. - * Concurrent access to Gossiper. + * Concurrent access to Node. */ -final class ClusterGossipDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { +final class ClusterGossipDaemon(system: ActorSystem, node: Node) extends Actor { val log = Logging(system, "ClusterGossipDaemon") def receive = { - case GossipEnvelope(sender, gossip) ⇒ gossiper.receive(sender, gossip) + case GossipEnvelope(sender, gossip) ⇒ node.receive(sender, gossip) case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } // FIXME Cluster public API should be an Extension -// FIXME Add cluster Node class and refactor out all non-gossip related stuff out of Gossiper +// FIXME Add cluster Node class and refactor out all non-gossip related stuff out of Node /** * This module is responsible for Gossiping cluster information. The abstraction maintains the list of live @@ -201,10 +201,10 @@ final class ClusterGossipDaemon(system: ActorSystem, gossiper: Gossiper) extends * gossip to random deputy with certain probability depending on number of unreachable, deputy and live members. * */ -case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { +case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** - * Represents the state for this Gossiper. Implemented using optimistic lockless concurrency, + * Represents the state for this Node. Implemented using optimistic lockless concurrency, * all state is represented by this immutable case class and managed by an AtomicReference. */ private case class State( @@ -216,7 +216,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val clusterSettings = new ClusterSettings(system.settings.config, system.name) val remoteAddress = remote.transport.address - val selfNode = VectorClock.Node(remoteAddress.toString) + val vclockNode = VectorClock.Node(remoteAddress.toString) val gossipInitialDelay = clusterSettings.GossipInitialDelay val gossipFrequency = clusterSettings.GossipFrequency @@ -234,7 +234,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val serialization = remote.serialization private val isRunning = new AtomicBoolean(true) - private val log = Logging(system, "Gossiper") + private val log = Logging(system, "Node") private val random = SecureRandom.getInstance("SHA1PRNG") // Is it right to put this guy under the /system path or should we have a top-level /cluster or something else...? @@ -247,13 +247,13 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val state = { val member = Member(remoteAddress, MemberStatus.Joining) - val gossip = Gossip(members = SortedSet.empty[Member] + member) + selfNode // add me as member and update my vector clock + val gossip = Gossip(members = SortedSet.empty[Member] + member) + vclockNode // add me as member and update my vector clock new AtomicReference[State](State(member, gossip)) } import Versioned.latestVersionOf - log.info("Node [{}] - Starting cluster Gossiper...", remoteAddress) + log.info("Node [{}] - Starting cluster Node...", remoteAddress) // try to join the node defined in the 'akka.cluster.node-to-join' option join() @@ -295,13 +295,13 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // FIXME Cheating for now. Can't just shut down. Node must first gossip an Leave command, wait for Leader to do proper Handoff and then await an Exit command before switching to Removed if (isRunning.compareAndSet(true, false)) { - log.info("Node [{}] - Shutting down Gossiper and ClusterDaemon...", remoteAddress) + log.info("Node [{}] - Shutting down Node and ClusterDaemon...", remoteAddress) try system.stop(clusterCommandDaemon) finally { try system.stop(clusterGossipDaemon) finally { try gossipCanceller.cancel() finally { try scrutinizeCanceller.cancel() finally { - log.info("Node [{}] - Gossiper and ClusterDaemon shut down successfully", remoteAddress) + log.info("Node [{}] - Node and ClusterDaemon shut down successfully", remoteAddress) } } } @@ -325,7 +325,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val newMembers = localMembers + Member(node, MemberStatus.Joining) // add joining node as Joining val newGossip = localGossip copy (members = newMembers) - val versionedGossip = newGossip + selfNode + val versionedGossip = newGossip + vclockNode val seenVersionedGossip = versionedGossip seen remoteAddress val newState = localState copy (latestGossip = seenVersionedGossip) @@ -354,7 +354,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (remoteGossip.version <> localGossip.version) { // concurrent val mergedGossip = merge(remoteGossip, localGossip) - val versionedMergedGossip = mergedGossip + selfNode + val versionedMergedGossip = mergedGossip + vclockNode log.debug( "Can't establish a causal relationship between \"remote\" gossip [{}] and \"local\" gossip [{}] - merging them into [{}]", @@ -497,7 +497,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*) val newGossip = localGossip copy (members = newMembersSortedSet) - val versionedGossip = newGossip + selfNode + val versionedGossip = newGossip + vclockNode val seenVersionedGossip = versionedGossip seen remoteAddress val newState = localState copy (self = newSelf, latestGossip = seenVersionedGossip) @@ -556,7 +556,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val newOverview = localOverview copy (unreachable = newUnreachableAddresses) val newGossip = localGossip copy (overview = newOverview, members = newMembers) - val versionedGossip = newGossip + selfNode + val versionedGossip = newGossip + vclockNode val seenVersionedGossip = versionedGossip seen remoteAddress val newState = localState copy (latestGossip = seenVersionedGossip) diff --git a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala index 8939c4d728..e92b21dbfb 100644 --- a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala @@ -22,19 +22,19 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" } """) with ImplicitSender { - var gossiper1: Gossiper = _ - var gossiper2: Gossiper = _ - var gossiper3: Gossiper = _ + var node1: Node = _ + var node2: Node = _ + var node3: Node = _ - var node1: ActorSystemImpl = _ - var node2: ActorSystemImpl = _ - var node3: ActorSystemImpl = _ + var system1: ActorSystemImpl = _ + var system2: ActorSystemImpl = _ + var system3: ActorSystemImpl = _ try { "A Gossip-driven Failure Detector" must { // ======= NODE 1 ======== - node1 = ActorSystem("node1", ConfigFactory + system1 = ActorSystem("system1", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -45,13 +45,13 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] - gossiper1 = Gossiper(node1, remote1) - val fd1 = gossiper1.failureDetector - val address1 = gossiper1.self.address + val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] + node1 = Node(system1, remote1) + val fd1 = node1.failureDetector + val address1 = node1.self.address // ======= NODE 2 ======== - node2 = ActorSystem("node2", ConfigFactory + system2 = ActorSystem("system2", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -59,17 +59,17 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" hostname = localhost port = 5551 } - cluster.node-to-join = "akka://node1@localhost:5550" + cluster.node-to-join = "akka://system1@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] - gossiper2 = Gossiper(node2, remote2) - val fd2 = gossiper2.failureDetector - val address2 = gossiper2.self.address + val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] + node2 = Node(system2, remote2) + val fd2 = node2.failureDetector + val address2 = node2.self.address // ======= NODE 3 ======== - node3 = ActorSystem("node3", ConfigFactory + system3 = ActorSystem("system3", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -77,17 +77,17 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" hostname = localhost port=5552 } - cluster.node-to-join = "akka://node1@localhost:5550" + cluster.node-to-join = "akka://system1@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote3 = node3.provider.asInstanceOf[RemoteActorRefProvider] - gossiper3 = Gossiper(node3, remote3) - val fd3 = gossiper3.failureDetector - val address3 = gossiper3.self.address + val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] + node3 = Node(system3, remote3) + val fd3 = node3.failureDetector + val address3 = node3.self.address - "receive gossip heartbeats so that all healthy nodes in the cluster are marked 'available'" taggedAs LongRunningTest in { - println("Let the nodes gossip for a while...") + "receive gossip heartbeats so that all healthy systems in the cluster are marked 'available'" taggedAs LongRunningTest in { + println("Let the systems gossip for a while...") Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds fd1.isAvailable(address2) must be(true) fd1.isAvailable(address3) must be(true) @@ -97,12 +97,12 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" fd3.isAvailable(address2) must be(true) } - "mark node as 'unavailable' if a node in the cluster is shut down (and its heartbeats stops)" taggedAs LongRunningTest in { - // shut down node3 - gossiper3.shutdown() + "mark system as 'unavailable' if a system in the cluster is shut down (and its heartbeats stops)" taggedAs LongRunningTest in { + // shut down system3 node3.shutdown() - println("Give the remaning nodes time to detect failure...") - Thread.sleep(30.seconds.dilated.toMillis) // give them 30 seconds to detect failure of node3 + system3.shutdown() + println("Give the remaning systems time to detect failure...") + Thread.sleep(30.seconds.dilated.toMillis) // give them 30 seconds to detect failure of system3 fd1.isAvailable(address2) must be(true) fd1.isAvailable(address3) must be(false) fd2.isAvailable(address1) must be(true) @@ -116,13 +116,13 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" } override def atTermination() { - if (gossiper1 ne null) gossiper1.shutdown() if (node1 ne null) node1.shutdown() + if (system1 ne null) system1.shutdown() - if (gossiper2 ne null) gossiper2.shutdown() if (node2 ne null) node2.shutdown() + if (system2 ne null) system2.shutdown() - if (gossiper3 ne null) gossiper3.shutdown() if (node3 ne null) node3.shutdown() + if (system3 ne null) system3.shutdown() } } diff --git a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala index a82bbe4d5e..197fa22b71 100644 --- a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala @@ -22,18 +22,18 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" } """) with ImplicitSender { - var gossiper0: Gossiper = _ - var gossiper1: Gossiper = _ - var gossiper2: Gossiper = _ + var node0: Node = _ + var node1: Node = _ + var node2: Node = _ - var node0: ActorSystemImpl = _ - var node1: ActorSystemImpl = _ - var node2: ActorSystemImpl = _ + var system0: ActorSystemImpl = _ + var system1: ActorSystemImpl = _ + var system2: ActorSystemImpl = _ try { - "A set of connected cluster nodes" must { - "(when two nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { - node0 = ActorSystem("node0", ConfigFactory + "A set of connected cluster systems" must { + "(when two systems) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { + system0 = ActorSystem("system0", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -44,10 +44,10 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] - gossiper0 = Gossiper(node0, remote0) + val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] + node0 = Node(system0, remote0) - node1 = ActorSystem("node1", ConfigFactory + system1 = ActorSystem("system1", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -55,21 +55,21 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" hostname = localhost port=5551 } - cluster.node-to-join = "akka://node0@localhost:5550" + cluster.node-to-join = "akka://system0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] - gossiper1 = Gossiper(node1, remote1) + val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] + node1 = Node(system1, remote1) val latch = new CountDownLatch(2) - gossiper0.registerListener(new MembershipChangeListener { + node0.registerListener(new MembershipChangeListener { def notify(members: SortedSet[Member]) { latch.countDown() } }) - gossiper1.registerListener(new MembershipChangeListener { + node1.registerListener(new MembershipChangeListener { def notify(members: SortedSet[Member]) { latch.countDown() } @@ -80,14 +80,14 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" Thread.sleep(10.seconds.dilated.toMillis) // check cluster convergence - gossiper0.convergence must be('defined) - gossiper1.convergence must be('defined) + node0.convergence must be('defined) + node1.convergence must be('defined) } - "(when three nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { + "(when three systems) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { // ======= NODE 2 ======== - node2 = ActorSystem("node2", ConfigFactory + system2 = ActorSystem("system2", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -95,25 +95,25 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" hostname = localhost port=5552 } - cluster.node-to-join = "akka://node0@localhost:5550" + cluster.node-to-join = "akka://system0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] - gossiper2 = Gossiper(node2, remote2) + val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] + node2 = Node(system2, remote2) val latch = new CountDownLatch(3) - gossiper0.registerListener(new MembershipChangeListener { + node0.registerListener(new MembershipChangeListener { def notify(members: SortedSet[Member]) { latch.countDown() } }) - gossiper1.registerListener(new MembershipChangeListener { + node1.registerListener(new MembershipChangeListener { def notify(members: SortedSet[Member]) { latch.countDown() } }) - gossiper2.registerListener(new MembershipChangeListener { + node2.registerListener(new MembershipChangeListener { def notify(members: SortedSet[Member]) { latch.countDown() } @@ -124,9 +124,9 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" Thread.sleep(10.seconds.dilated.toMillis) // check cluster convergence - gossiper0.convergence must be('defined) - gossiper1.convergence must be('defined) - gossiper2.convergence must be('defined) + node0.convergence must be('defined) + node1.convergence must be('defined) + node2.convergence must be('defined) } } } catch { @@ -136,13 +136,13 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" } override def atTermination() { - if (gossiper0 ne null) gossiper0.shutdown() if (node0 ne null) node0.shutdown() + if (system0 ne null) system0.shutdown() - if (gossiper1 ne null) gossiper1.shutdown() if (node1 ne null) node1.shutdown() + if (system1 ne null) system1.shutdown() - if (gossiper2 ne null) gossiper2.shutdown() if (node2 ne null) node2.shutdown() + if (system2 ne null) system2.shutdown() } } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index 2ce0a1d449..dc24485507 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -19,20 +19,20 @@ class NodeMembershipSpec extends AkkaSpec(""" } """) with ImplicitSender { - var gossiper0: Gossiper = _ - var gossiper1: Gossiper = _ - var gossiper2: Gossiper = _ + var node0: Node = _ + var node1: Node = _ + var node2: Node = _ - var node0: ActorSystemImpl = _ - var node1: ActorSystemImpl = _ - var node2: ActorSystemImpl = _ + var system0: ActorSystemImpl = _ + var system1: ActorSystemImpl = _ + var system2: ActorSystemImpl = _ try { - "A set of connected cluster nodes" must { - "(when two nodes) start gossiping to each other so that both nodes gets the same gossip info" taggedAs LongRunningTest in { + "A set of connected cluster systems" must { + "(when two systems) start gossiping to each other so that both systems gets the same gossip info" taggedAs LongRunningTest in { // ======= NODE 0 ======== - node0 = ActorSystem("node0", ConfigFactory + system0 = ActorSystem("system0", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -43,11 +43,11 @@ class NodeMembershipSpec extends AkkaSpec(""" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] - gossiper0 = Gossiper(node0, remote0) + val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] + node0 = Node(system0, remote0) // ======= NODE 1 ======== - node1 = ActorSystem("node1", ConfigFactory + system1 = ActorSystem("system1", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -55,27 +55,27 @@ class NodeMembershipSpec extends AkkaSpec(""" hostname = localhost port=5551 } - cluster.node-to-join = "akka://node0@localhost:5550" + cluster.node-to-join = "akka://system0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] - gossiper1 = Gossiper(node1, remote1) + val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] + node1 = Node(system1, remote1) Thread.sleep(10.seconds.dilated.toMillis) // check cluster convergence - gossiper0.convergence must be('defined) - gossiper1.convergence must be('defined) + node0.convergence must be('defined) + node1.convergence must be('defined) - val members0 = gossiper0.latestGossip.members.toArray + val members0 = node0.latestGossip.members.toArray members0.size must be(2) members0(0).address.port.get must be(5550) members0(0).status must be(MemberStatus.Joining) members0(1).address.port.get must be(5551) members0(1).status must be(MemberStatus.Joining) - val members1 = gossiper1.latestGossip.members.toArray + val members1 = node1.latestGossip.members.toArray members1.size must be(2) members1(0).address.port.get must be(5550) members1(0).status must be(MemberStatus.Joining) @@ -83,10 +83,10 @@ class NodeMembershipSpec extends AkkaSpec(""" members1(1).status must be(MemberStatus.Joining) } - "(when three nodes) start gossiping to each other so that both nodes gets the same gossip info" taggedAs LongRunningTest in { + "(when three systems) start gossiping to each other so that both systems gets the same gossip info" taggedAs LongRunningTest in { // ======= NODE 2 ======== - node2 = ActorSystem("node2", ConfigFactory + system2 = ActorSystem("system2", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -94,22 +94,22 @@ class NodeMembershipSpec extends AkkaSpec(""" hostname = localhost port=5552 } - cluster.node-to-join = "akka://node0@localhost:5550" + cluster.node-to-join = "akka://system0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] - gossiper2 = Gossiper(node2, remote2) + val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] + node2 = Node(system2, remote2) Thread.sleep(10.seconds.dilated.toMillis) // check cluster convergence - gossiper0.convergence must be('defined) - gossiper1.convergence must be('defined) - gossiper2.convergence must be('defined) + node0.convergence must be('defined) + node1.convergence must be('defined) + node2.convergence must be('defined) - val members0 = gossiper0.latestGossip.members.toArray - val version = gossiper0.latestGossip.version + val members0 = node0.latestGossip.members.toArray + val version = node0.latestGossip.version members0.size must be(3) members0(0).address.port.get must be(5550) members0(0).status must be(MemberStatus.Joining) @@ -118,7 +118,7 @@ class NodeMembershipSpec extends AkkaSpec(""" members0(2).address.port.get must be(5552) members0(2).status must be(MemberStatus.Joining) - val members1 = gossiper1.latestGossip.members.toArray + val members1 = node1.latestGossip.members.toArray members1.size must be(3) members1(0).address.port.get must be(5550) members1(0).status must be(MemberStatus.Joining) @@ -127,7 +127,7 @@ class NodeMembershipSpec extends AkkaSpec(""" members1(2).address.port.get must be(5552) members1(2).status must be(MemberStatus.Joining) - val members2 = gossiper2.latestGossip.members.toArray + val members2 = node2.latestGossip.members.toArray members2.size must be(3) members2(0).address.port.get must be(5550) members2(0).status must be(MemberStatus.Joining) @@ -144,13 +144,13 @@ class NodeMembershipSpec extends AkkaSpec(""" } override def atTermination() { - if (gossiper0 ne null) gossiper0.shutdown() if (node0 ne null) node0.shutdown() + if (system0 ne null) system0.shutdown() - if (gossiper1 ne null) gossiper1.shutdown() if (node1 ne null) node1.shutdown() + if (system1 ne null) system1.shutdown() - if (gossiper2 ne null) gossiper2.shutdown() if (node2 ne null) node2.shutdown() + if (system2 ne null) system2.shutdown() } } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index b3805b7946..3d98260c4d 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -19,14 +19,14 @@ class NodeStartupSpec extends AkkaSpec(""" } """) with ImplicitSender { - var gossiper0: Gossiper = _ - var gossiper1: Gossiper = _ - var node0: ActorSystemImpl = _ - var node1: ActorSystemImpl = _ + var node0: Node = _ + var node1: Node = _ + var system0: ActorSystemImpl = _ + var system1: ActorSystemImpl = _ try { "A first cluster node with a 'node-to-join' config set to empty string (singleton cluster)" must { - node0 = ActorSystem("NodeStartupSpec", ConfigFactory + system0 = ActorSystem("NodeStartupSpec", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -37,16 +37,16 @@ class NodeStartupSpec extends AkkaSpec(""" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] - gossiper0 = Gossiper(node0, remote0) + val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] + node0 = Node(system0, remote0) "be a singleton cluster when started up" in { Thread.sleep(1.seconds.dilated.toMillis) - gossiper0.isSingletonCluster must be(true) + node0.isSingletonCluster must be(true) } "be in 'Up' phase when started up" in { - val members = gossiper0.latestGossip.members + val members = node0.latestGossip.members val joiningMember = members find (_.address.port.get == 5550) joiningMember must be('defined) joiningMember.get.status must be(MemberStatus.Joining) @@ -55,7 +55,7 @@ class NodeStartupSpec extends AkkaSpec(""" "A second cluster node with a 'node-to-join' config defined" must { "join the other node cluster as 'Joining' when sending a Join command" in { - node1 = ActorSystem("NodeStartupSpec", ConfigFactory + system1 = ActorSystem("NodeStartupSpec", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -67,11 +67,11 @@ class NodeStartupSpec extends AkkaSpec(""" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] - gossiper1 = Gossiper(node1, remote1) + val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] + node1 = Node(system1, remote1) Thread.sleep(1.seconds.dilated.toMillis) // give enough time for node1 to JOIN node0 - val members = gossiper0.latestGossip.members + val members = node0.latestGossip.members val joiningMember = members find (_.address.port.get == 5551) joiningMember must be('defined) joiningMember.get.status must be(MemberStatus.Joining) @@ -84,10 +84,10 @@ class NodeStartupSpec extends AkkaSpec(""" } override def atTermination() { - if (gossiper0 ne null) gossiper0.shutdown() if (node0 ne null) node0.shutdown() + if (system0 ne null) system0.shutdown() - if (gossiper1 ne null) gossiper1.shutdown() if (node1 ne null) node1.shutdown() + if (system1 ne null) system1.shutdown() } } From 0c405606e3ebfb7352ac7e0c67309840e9153b93 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Mon, 20 Feb 2012 17:22:07 +0100 Subject: [PATCH 27/72] Added support for "leader election", the isLeader method and leader election tests. Also fixed bug in scrutinizer not maintaining the 'seen' map. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/Node.scala | 13 +- .../akka/cluster/LeaderElectionSpec.scala | 155 ++++++++++++++++++ 2 files changed, 167 insertions(+), 1 deletion(-) create mode 100644 akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index 0eaa6b1d16..bcb9d1ecbc 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -282,6 +282,14 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ def self: Member = state.get.self + /** + * Is this node the leader? + */ + def isLeader: Boolean = { + val currentState = state.get + remoteAddress == currentState.latestGossip.members.head.address + } + /** * Is this node a singleton cluster? */ @@ -540,6 +548,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val localGossip = localState.latestGossip val localOverview = localGossip.overview + val localSeen = localOverview.seen val localMembers = localGossip.members val localUnreachableAddresses = localGossip.overview.unreachable @@ -553,7 +562,9 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { log.info("Node [{}] - Marking node(s) an unreachable [{}]", remoteAddress, newlyDetectedUnreachableAddresses.mkString(", ")) - val newOverview = localOverview copy (unreachable = newUnreachableAddresses) + val newSeen = newUnreachableAddresses.foldLeft(localSeen)((currentSeen, address) ⇒ currentSeen - address) + + val newOverview = localOverview copy (seen = newSeen, unreachable = newUnreachableAddresses) val newGossip = localGossip copy (overview = newOverview, members = newMembers) val versionedGossip = newGossip + vclockNode diff --git a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala new file mode 100644 index 0000000000..dc0d8632a1 --- /dev/null +++ b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala @@ -0,0 +1,155 @@ +/** + * Copyright (C) 2009-2012 Typesafe Inc. + */ +package akka.cluster + +import akka.testkit._ +import akka.dispatch._ +import akka.actor._ +import akka.remote._ +import akka.util.duration._ + +import com.typesafe.config._ + +import java.net.InetSocketAddress + +class LeaderElectionSpec extends AkkaSpec(""" + akka { + loglevel = "DEBUG" + actor.debug.lifecycle = on + actor.debug.autoreceive = on + cluster.failure-detector.threshold = 3 + } + """) with ImplicitSender { + + var node1: Node = _ + var node2: Node = _ + var node3: Node = _ + + var system1: ActorSystemImpl = _ + var system2: ActorSystemImpl = _ + var system3: ActorSystemImpl = _ + + try { + "A cluster of three nodes" must { + + // ======= NODE 1 ======== + system1 = ActorSystem("system1", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] + node1 = Node(system1, remote1) + val fd1 = node1.failureDetector + val address1 = node1.self.address + + // ======= NODE 2 ======== + system2 = ActorSystem("system2", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port = 5551 + } + cluster.node-to-join = "akka://system1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] + node2 = Node(system2, remote2) + val fd2 = node2.failureDetector + val address2 = node2.self.address + + // ======= NODE 3 ======== + system3 = ActorSystem("system3", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5552 + } + cluster.node-to-join = "akka://system1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] + node3 = Node(system3, remote3) + val fd3 = node3.failureDetector + val address3 = node3.self.address + + "be able to 'elect' a single leader" taggedAs LongRunningTest in { + + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds + + // check cluster convergence + node1.convergence must be('defined) + node2.convergence must be('defined) + node3.convergence must be('defined) + + // check leader + node1.isLeader must be(true) + node2.isLeader must be(false) + node3.isLeader must be(false) + } + + "be able to 're-elect' a single leader after leader has left" taggedAs LongRunningTest in { + + // shut down system1 - the leader + node1.shutdown() + system1.shutdown() + + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) // give them 30 seconds to detect failure of system3 + + // check cluster convergence + node2.convergence must be('defined) + node3.convergence must be('defined) + + // check leader + node2.isLeader must be(true) + node3.isLeader must be(false) + } + + "be able to 're-elect' a single leader after leader has left (again, leaving a single node)" taggedAs LongRunningTest in { + + // shut down system1 - the leader + node2.shutdown() + system2.shutdown() + + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) // give them 30 seconds to detect failure of system3 + + // check cluster convergence + node3.convergence must be('defined) + + // check leader + node3.isLeader must be(true) + } + } + } catch { + case e: Exception ⇒ + e.printStackTrace + fail(e.toString) + } + + override def atTermination() { + if (node1 ne null) node1.shutdown() + if (system1 ne null) system1.shutdown() + + if (node2 ne null) node2.shutdown() + if (system2 ne null) system2.shutdown() + + if (node3 ne null) node3.shutdown() + if (system3 ne null) system3.shutdown() + } +} From c7fd6870f80834794a3bc37141693ab9231bca19 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sat, 1 Jan 2011 01:50:33 +0100 Subject: [PATCH 28/72] Added support for 'deputy-nodes'. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Added 'nr-of-deputy-nodes' config option * Added fetching of current deputy node addresses * Minor refactorings Signed-off-by: Jonas Bonér --- .../src/main/resources/reference.conf | 1 + .../scala/akka/cluster/ClusterSettings.scala | 1 + .../src/main/scala/akka/cluster/Node.scala | 87 +++++++++---------- .../akka/cluster/ClusterConfigSpec.scala | 1 + 4 files changed, 44 insertions(+), 46 deletions(-) diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index 871026a275..58b8b0185f 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -14,6 +14,7 @@ akka { # the number of gossip daemon actors nr-of-gossip-daemons = 4 + nr-of-deputy-nodes = 3 gossip { initialDelay = 5s diff --git a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala index c9cb9ede4a..8365faacc7 100644 --- a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala +++ b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala @@ -22,4 +22,5 @@ class ClusterSettings(val config: Config, val systemName: String) { val GossipInitialDelay = Duration(getMilliseconds("akka.cluster.gossip.initialDelay"), MILLISECONDS) val GossipFrequency = Duration(getMilliseconds("akka.cluster.gossip.frequency"), MILLISECONDS) val NrOfGossipDaemons = getInt("akka.cluster.nr-of-gossip-daemons") + val NrOfDeputyNodes = getInt("akka.cluster.nr-of-deputy-nodes") } diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index bcb9d1ecbc..bb8bec4d31 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -184,7 +184,6 @@ final class ClusterGossipDaemon(system: ActorSystem, node: Node) extends Actor { } // FIXME Cluster public API should be an Extension -// FIXME Add cluster Node class and refactor out all non-gossip related stuff out of Node /** * This module is responsible for Gossiping cluster information. The abstraction maintains the list of live @@ -228,6 +227,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val failureDetector = new AccrualFailureDetector( system, remoteAddress, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) + private val nrOfDeputyNodes = clusterSettings.NrOfDeputyNodes private val nrOfGossipDaemons = clusterSettings.NrOfGossipDaemons private val nodeToJoin: Option[Address] = clusterSettings.NodeToJoin filter (_ != remoteAddress) @@ -237,8 +237,6 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val log = Logging(system, "Node") private val random = SecureRandom.getInstance("SHA1PRNG") - // Is it right to put this guy under the /system path or should we have a top-level /cluster or something else...? - // FIXME should be defined as a router so we get concurrency here private val clusterCommandDaemon = system.systemActorOf( Props(new ClusterCommandDaemon(system, this)), "clusterCommand") @@ -259,12 +257,12 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { join() // start periodic gossip to random nodes in cluster - val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { + private val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { gossip() } // start periodic cluster scrutinization (moving nodes condemned by the failure detector to unreachable list) - val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { + private val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { scrutinize() } @@ -295,6 +293,13 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ def isSingletonCluster: Boolean = isSingletonCluster(state.get) + /** + * Checks if we have a cluster convergence. + * + * @returns Some(convergedGossip) if convergence have been reached and None if not + */ + def convergence: Option[Gossip] = convergence(latestGossip) + /** * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. */ @@ -317,11 +322,37 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } } + /** + * Registers a listener to subscribe to cluster membership changes. + */ + @tailrec + final def registerListener(listener: MembershipChangeListener) { + val localState = state.get + val newListeners = localState.memberMembershipChangeListeners + listener + val newState = localState copy (memberMembershipChangeListeners = newListeners) + if (!state.compareAndSet(localState, newState)) registerListener(listener) // recur + } + + /** + * Unsubscribes to cluster membership changes. + */ + @tailrec + final def unregisterListener(listener: MembershipChangeListener) { + val localState = state.get + val newListeners = localState.memberMembershipChangeListeners - listener + val newState = localState copy (memberMembershipChangeListeners = newListeners) + if (!state.compareAndSet(localState, newState)) unregisterListener(listener) // recur + } + + // ======================================================== + // ===================== INTERNAL API ===================== + // ======================================================== + /** * New node joining. */ @tailrec - final def joining(node: Address) { + private[cluster] final def joining(node: Address) { log.info("Node [{}] - Node [{}] is joining", remoteAddress, node) failureDetector heartbeat node // update heartbeat in failure detector @@ -350,7 +381,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * Receive new gossip. */ @tailrec - final def receive(sender: Member, remoteGossip: Gossip) { + private[cluster] final def receive(sender: Member, remoteGossip: Gossip) { log.debug("Node [{}] - Receiving gossip from [{}]", remoteAddress, sender.address) failureDetector heartbeat sender.address // update heartbeat in failure detector @@ -390,39 +421,6 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } } - /** - * Registers a listener to subscribe to cluster membership changes. - */ - @tailrec - final def registerListener(listener: MembershipChangeListener) { - val localState = state.get - val newListeners = localState.memberMembershipChangeListeners + listener - val newState = localState copy (memberMembershipChangeListeners = newListeners) - if (!state.compareAndSet(localState, newState)) registerListener(listener) // recur - } - - /** - * Unsubscribes to cluster membership changes. - */ - @tailrec - final def unregisterListener(listener: MembershipChangeListener) { - val localState = state.get - val newListeners = localState.memberMembershipChangeListeners - listener - val newState = localState copy (memberMembershipChangeListeners = newListeners) - if (!state.compareAndSet(localState, newState)) unregisterListener(listener) // recur - } - - /** - * Checks if we have a cluster convergence. - * - * @returns Some(convergedGossip) if convergence have been reached and None if not - */ - def convergence: Option[Gossip] = convergence(latestGossip) - - // ======================================================== - // ===================== INTERNAL API ===================== - // ======================================================== - /** * Joins the pre-configured contact point and retrieves current gossip state. */ @@ -461,7 +459,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } // 3. gossip to a deputy nodes for facilitating partition healing - val deputies = deputyNodesWithoutMyself + val deputies = deputyNodes if ((!gossipedToDeputy || localMembersSize < 1) && !deputies.isEmpty) { if (localMembersSize == 0) gossipToRandomNodeOf(deputies) else { @@ -530,11 +528,8 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private def gossipToRandomNodeOf(addresses: Seq[Address]): Boolean = { val peers = addresses filter (_ != remoteAddress) // filter out myself val peer = selectRandomNode(peers) - val localState = state.get - val localGossip = localState.latestGossip - // if connection can't be established/found => ignore it since the failure detector will take care of the potential problem gossipTo(peer) - deputyNodesWithoutMyself exists (peer == _) + deputyNodes exists (peer == _) } /** @@ -607,7 +602,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ private def clusterGossipConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "clusterGossip") - private def deputyNodesWithoutMyself: Seq[Address] = Seq.empty[Address] filter (_ != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq + private def deputyNodes: Seq[Address] = state.get.latestGossip.members.toSeq map (_.address) drop 1 take nrOfDeputyNodes filter (_ != remoteAddress) private def selectRandomNode(addresses: Seq[Address]): Address = addresses(random nextInt addresses.size) diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 2afbc7efc0..6668044f33 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -29,6 +29,7 @@ class ClusterConfigSpec extends AkkaSpec( GossipInitialDelay must be(5 seconds) GossipFrequency must be(1 second) NrOfGossipDaemons must be(4) + NrOfDeputyNodes must be(3) } } } From 082649fffb833c4e05d3cff4f36aa0be85040ecf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 22 Feb 2012 18:40:16 +0100 Subject: [PATCH 29/72] Turned cluster Node into an Extension. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/Node.scala | 32 ++++++++++++++++--- .../GossipingAccrualFailureDetectorSpec.scala | 6 ++-- .../akka/cluster/LeaderElectionSpec.scala | 6 ++-- .../MembershipChangeListenerSpec.scala | 6 ++-- .../akka/cluster/NodeMembershipSpec.scala | 6 ++-- .../scala/akka/cluster/NodeStartupSpec.scala | 4 +-- 6 files changed, 41 insertions(+), 19 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index bb8bec4d31..33c1ad840e 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -170,6 +170,8 @@ final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor } } +// FIXME create package object with implicit conversion that enables: system.node + /** * Pooled and routed wit N number of configurable instances. * Concurrent access to Node. @@ -183,7 +185,22 @@ final class ClusterGossipDaemon(system: ActorSystem, node: Node) extends Actor { } } -// FIXME Cluster public API should be an Extension +/** + * Node Extension Id and factory for creating Node extension. + * Example: + * {{{ + * val node = NodeExtension(system) + * + * if (node.isLeader) { ... } + * }}} + */ +object NodeExtension extends ExtensionId[Node] with ExtensionIdProvider { + override def get(system: ActorSystem): Node = super.get(system) + + override def lookup = NodeExtension + + override def createExtension(system: ExtendedActorSystem): Node = new Node(system.asInstanceOf[ActorSystemImpl]) // not nice but need API in ActorSystemImpl inside Node +} /** * This module is responsible for Gossiping cluster information. The abstraction maintains the list of live @@ -200,7 +217,12 @@ final class ClusterGossipDaemon(system: ActorSystem, node: Node) extends Actor { * gossip to random deputy with certain probability depending on number of unreachable, deputy and live members. * */ -case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { +class Node(system: ActorSystemImpl) extends Extension { + + if (!system.provider.isInstanceOf[RemoteActorRefProvider]) + throw new ConfigurationException("ActorSystem[" + system + "] needs to have a 'RemoteActorRefProvider' enabled in the configuration") + + val remote: RemoteActorRefProvider = system.provider.asInstanceOf[RemoteActorRefProvider] /** * Represents the state for this Node. Implemented using optimistic lockless concurrency, @@ -372,7 +394,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (!state.compareAndSet(localState, newState)) joining(node) // recur if we failed update else { if (convergence(newState.latestGossip).isDefined) { - newState.memberMembershipChangeListeners map { _ notify newMembers } // FIXME should check for cluster convergence before triggering listeners + newState.memberMembershipChangeListeners map { _ notify newMembers } } } } @@ -416,7 +438,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (!state.compareAndSet(localState, newState)) receive(sender, remoteGossip) // recur if we fail the update else { if (convergence(newState.latestGossip).isDefined) { - newState.memberMembershipChangeListeners map { _ notify newState.latestGossip.members } // FIXME should check for cluster convergence before triggering listeners + newState.memberMembershipChangeListeners map { _ notify newState.latestGossip.members } } } } @@ -571,7 +593,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (!state.compareAndSet(localState, newState)) scrutinize() // recur else { if (convergence(newState.latestGossip).isDefined) { - newState.memberMembershipChangeListeners map { _ notify newMembers } // FIXME should check for cluster convergence before triggering listeners + newState.memberMembershipChangeListeners map { _ notify newMembers } } } } diff --git a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala index e92b21dbfb..f4deb77706 100644 --- a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala @@ -46,7 +46,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = Node(system1, remote1) + node1 = new Node(system1) val fd1 = node1.failureDetector val address1 = node1.self.address @@ -64,7 +64,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = Node(system2, remote2) + node2 = new Node(system2) val fd2 = node2.failureDetector val address2 = node2.self.address @@ -82,7 +82,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] - node3 = Node(system3, remote3) + node3 = new Node(system3) val fd3 = node3.failureDetector val address3 = node3.self.address diff --git a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala index dc0d8632a1..e3da64cfa0 100644 --- a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala @@ -46,7 +46,7 @@ class LeaderElectionSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = Node(system1, remote1) + node1 = new Node(system1) val fd1 = node1.failureDetector val address1 = node1.self.address @@ -64,7 +64,7 @@ class LeaderElectionSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = Node(system2, remote2) + node2 = new Node(system2) val fd2 = node2.failureDetector val address2 = node2.self.address @@ -82,7 +82,7 @@ class LeaderElectionSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] - node3 = Node(system3, remote3) + node3 = new Node(system3) val fd3 = node3.failureDetector val address3 = node3.self.address diff --git a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala index 197fa22b71..e2487de3c8 100644 --- a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala @@ -45,7 +45,7 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] - node0 = Node(system0, remote0) + node0 = new Node(system0) system1 = ActorSystem("system1", ConfigFactory .parseString(""" @@ -60,7 +60,7 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = Node(system1, remote1) + node1 = new Node(system1) val latch = new CountDownLatch(2) @@ -100,7 +100,7 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = Node(system2, remote2) + node2 = new Node(system2) val latch = new CountDownLatch(3) node0.registerListener(new MembershipChangeListener { diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index dc24485507..56f053fc6c 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -44,7 +44,7 @@ class NodeMembershipSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] - node0 = Node(system0, remote0) + node0 = new Node(system0) // ======= NODE 1 ======== system1 = ActorSystem("system1", ConfigFactory @@ -60,7 +60,7 @@ class NodeMembershipSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = Node(system1, remote1) + node1 = new Node(system1) Thread.sleep(10.seconds.dilated.toMillis) @@ -99,7 +99,7 @@ class NodeMembershipSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = Node(system2, remote2) + node2 = new Node(system2) Thread.sleep(10.seconds.dilated.toMillis) diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index 3d98260c4d..ed4b893619 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -38,7 +38,7 @@ class NodeStartupSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] - node0 = Node(system0, remote0) + node0 = new Node(system0) "be a singleton cluster when started up" in { Thread.sleep(1.seconds.dilated.toMillis) @@ -68,7 +68,7 @@ class NodeStartupSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = Node(system1, remote1) + node1 = new Node(system1) Thread.sleep(1.seconds.dilated.toMillis) // give enough time for node1 to JOIN node0 val members = node0.latestGossip.members From 57264816017af44a567c43a438cfc2877fa5781a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 7 Feb 2012 14:20:51 +0100 Subject: [PATCH 30/72] Changes to cluster specification. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Added section on single-node cluster. - Changed seed nodes to deputy nodes. - Seed nodes are no longer used as contact points only to break logical partitions. Signed-off-by: Jonas Bonér --- akka-docs/cluster/cluster.rst | 114 +++++++++++++++++++++------------- 1 file changed, 72 insertions(+), 42 deletions(-) diff --git a/akka-docs/cluster/cluster.rst b/akka-docs/cluster/cluster.rst index 0ca983236a..4cd336304d 100644 --- a/akka-docs/cluster/cluster.rst +++ b/akka-docs/cluster/cluster.rst @@ -57,16 +57,32 @@ These terms are used throughout the documentation. A mapping from partition path to a set of instance nodes (where the nodes are referred to by the ordinal position given the nodes in sorted order). +**leader** + A single node in the cluster that acts as the leader. Managing cluster convergence, + partitions, fail-over, rebalancing etc. + +**deputy nodes** + A set of nodes responsible for breaking logical partitions. + Membership ========== A cluster is made up of a set of member nodes. The identifier for each node is a -`hostname:port` pair. An Akka application is distributed over a cluster with +``hostname:port`` pair. An Akka application is distributed over a cluster with each node hosting some part of the application. Cluster membership and partitioning of the application are decoupled. A node could be a member of a cluster without hosting any actors. +Single-node Cluster +------------------- + +If a node does not have a preconfigured contact point to join in the Akka +configuration, then it is considered a single-node cluster and will +automatically transition from ``joining`` to ``up``. Single-node clusters +can later explicitly send a ``Join`` message to another node to form a N-node +cluster. It is also possible to link multiple N-node clusters by ``joining`` them. + Gossip ------ @@ -75,8 +91,8 @@ The cluster membership used in Akka is based on Amazon's `Dynamo`_ system and particularly the approach taken in Basho's' `Riak`_ distributed database. Cluster membership is communicated using a `Gossip Protocol`_, where the current state of the cluster is gossiped randomly through the cluster. Joining a cluster -is initiated by specifying a set of ``seed`` nodes with which to begin -gossiping. +is initiated by issuing a ``Join`` command to one of the nodes in the cluster to +join. .. _Gossip Protocol: http://en.wikipedia.org/wiki/Gossip_protocol .. _Dynamo: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf @@ -102,7 +118,7 @@ the `pruning algorithm`_ in Riak. .. _pruning algorithm: http://wiki.basho.com/Vector-Clocks.html#Vector-Clock-Pruning -Gossip convergence +Gossip Convergence ^^^^^^^^^^^^^^^^^^ Information about the cluster converges at certain points of time. This is when @@ -146,31 +162,45 @@ order to account for network issues that sometimes occur on such platforms. Leader ^^^^^^ -After gossip convergence a leader for the cluster can be determined. There is no -leader election process, the leader can always be recognised deterministically -by any node whenever there is gossip convergence. The leader is simply the first +After gossip convergence a ``leader`` for the cluster can be determined. There is no +``leader`` election process, the ``leader`` can always be recognised deterministically +by any node whenever there is gossip convergence. The ``leader`` is simply the first node in sorted order that is able to take the leadership role, where the only -allowed member states for a leader are ``up`` or ``leaving`` (see below for more +allowed member states for a ``leader`` are ``up`` or ``leaving`` (see below for more information about member states). -The role of the leader is to shift members in and out of the cluster, changing +The role of the ``leader`` is to shift members in and out of the cluster, changing ``joining`` members to the ``up`` state or ``exiting`` members to the ``removed`` state, and to schedule rebalancing across the cluster. Currently -leader actions are only triggered by receiving a new cluster state with gossip +``leader`` actions are only triggered by receiving a new cluster state with gossip convergence but it may also be possible for the user to explicitly rebalance the cluster by specifying migrations, or to rebalance the cluster automatically based on metrics from member nodes. Metrics may be spread using the gossip protocol or possibly more efficiently using a *random chord* method, where the -leader contacts several random nodes around the cluster ring and each contacted +``leader`` contacts several random nodes around the cluster ring and each contacted node gathers information from their immediate neighbours, giving a random sampling of load information. -The leader also has the power, if configured so, to "auto-down" a node that +The ``leader`` also has the power, if configured so, to "auto-down" a node that according to the Failure Detector is considered unreachable. This means setting the unreachable node status to ``down`` automatically. -Gossip protocol +Deputy Nodes +^^^^^^^^^^^^ + +After gossip convergence a set of ``deputy`` nodes for the cluster can be +determined. As with the ``leader``, there is no ``deputy`` election process, +the deputies can always be recognised deterministically by any node whenever there +is gossip convergence. The list of ``deputy`` nodes is simply the N - 1 number +of nodes (e.g. starting with the first node after the ``leader``) in sorted order. + +The nodes defined as ``deputy`` nodes are just regular member nodes whose only +"special role" is to help breaking logical partitions as seen in the gossip +algorithm defined below. + + +Gossip Protocol ^^^^^^^^^^^^^^^ A variation of *push-pull gossip* is used to reduce the amount of gossip @@ -186,14 +216,14 @@ nodes involved in a gossip exchange. Periodically, the default is every 1 second, each node chooses another random node to initiate a round of gossip with. The choice of node is random but can -also include extra gossiping for unreachable nodes, seed nodes, and nodes with +also include extra gossiping for unreachable nodes, ``deputy`` nodes, and nodes with either newer or older state versions. The gossip overview contains the current state version for all nodes and also a list of unreachable nodes. Whenever a node receives a gossip overview it updates the `Failure Detector`_ with the liveness information. -The nodes defined as ``seed`` nodes are just regular member nodes whose only +The nodes defined as ``deputy`` nodes are just regular member nodes whose only "special role" is to function as contact points in the cluster and to help breaking logical partitions as seen in the gossip algorithm defined below. @@ -204,9 +234,9 @@ During each round of gossip exchange the following process is used: 2. Gossip to random unreachable node with certain probability depending on the number of unreachable and live nodes -3. If the node gossiped to at (1) was not a ``seed`` node, or the number of live - nodes is less than number of seeds, gossip to random ``seed`` node with - certain probability depending on number of unreachable, seed, and live nodes. +3. If the node gossiped to at (1) was not a ``deputy`` node, or the number of live + nodes is less than number of ``deputy`` nodes, gossip to random ``deputy`` node with + certain probability depending on number of unreachable, ``deputy``, and live nodes. 4. Gossip to random node with newer or older state information, based on the current gossip overview, with some probability (?) @@ -260,18 +290,18 @@ Some of the other structures used are:: PartitionChangeStatus = Awaiting | Complete -Membership lifecycle +Membership Lifecycle -------------------- A node begins in the ``joining`` state. Once all nodes have seen that the new -node is joining (through gossip convergence) the leader will set the member +node is joining (through gossip convergence) the ``leader`` will set the member state to ``up`` and can start assigning partitions to the new node. If a node is leaving the cluster in a safe, expected manner then it switches to -the ``leaving`` state. The leader will reassign partitions across the cluster -(it is possible for a leaving node to itself be the leader). When all partition +the ``leaving`` state. The ``leader`` will reassign partitions across the cluster +(it is possible for a leaving node to itself be the ``leader``). When all partition handoff has completed then the node will change to the ``exiting`` state. Once -all nodes have seen the exiting state (convergence) the leader will remove the +all nodes have seen the exiting state (convergence) the ``leader`` will remove the node from the cluster, marking it as ``removed``. A node can also be removed forcefully by moving it directly to the ``removed`` @@ -279,7 +309,7 @@ state using the ``remove`` action. The cluster will rebalance based on the new cluster membership. If a node is unreachable then gossip convergence is not possible and therefore -any leader actions are also not possible (for instance, allowing a node to +any ``leader`` actions are also not possible (for instance, allowing a node to become a part of the cluster, or changing actor distribution). To be able to move forward the state of the unreachable nodes must be changed. If the unreachable node is experiencing only transient difficulties then it can be @@ -293,13 +323,13 @@ This means that nodes can join and leave the cluster at any point in time, e.g. provide cluster elasticity. -State diagram for the member states +State Diagram for the Member States ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. image:: images/member-states.png -Member states +Member States ^^^^^^^^^^^^^ - **joining** @@ -318,12 +348,12 @@ Member states marked as down/offline/unreachable -User actions +User Actions ^^^^^^^^^^^^ - **join** join a single node to a cluster - can be explicit or automatic on - startup if a list of seed nodes have been specified in the configuration + startup if a node to join have been specified in the configuration - **leave** tell a node to leave the cluster gracefully @@ -335,10 +365,10 @@ User actions remove a node from the cluster immediately -Leader actions +Leader Actions ^^^^^^^^^^^^^^ -The leader has the following duties: +The ``leader`` has the following duties: - shifting members in and out of the cluster @@ -364,7 +394,7 @@ set of nodes in the cluster. The actor at the head of the partition is referred to as the partition point. The mapping from partition path (actor address of the format "a/b/c") to instance nodes is stored in the partition table and is maintained as part of the cluster state through the gossip protocol. The -partition table is only updated by the leader node. Currently the only possible +partition table is only updated by the ``leader`` node. Currently the only possible partition points are *routed* actors. Routed actors can have an instance count greater than one. The instance count is @@ -375,7 +405,7 @@ Note that in the first implementation there may be a restriction such that only top-level partitions are possible (the highest possible partition points are used and sub-partitioning is not allowed). Still to be explored in more detail. -The cluster leader determines the current instance count for a partition based +The cluster ``leader`` determines the current instance count for a partition based on two axes: fault-tolerance and scaling. Fault-tolerance determines a minimum number of instances for a routed actor @@ -415,8 +445,8 @@ the following, with all instances on the same physical nodes as before:: B -> { 7, 9, 10 } C -> { 12, 14, 15, 1, 2 } -When rebalancing is required the leader will schedule handoffs, gossiping a set -of pending changes, and when each change is complete the leader will update the +When rebalancing is required the ``leader`` will schedule handoffs, gossiping a set +of pending changes, and when each change is complete the ``leader`` will update the partition table. @@ -436,7 +466,7 @@ the handoff), given a previous host node ``N1``, a new host node ``N2``, and an actor partition ``A`` to be migrated from ``N1`` to ``N2``, has this general structure: - 1. the leader sets a pending change for ``N1`` to handoff ``A`` to ``N2`` + 1. the ``leader`` sets a pending change for ``N1`` to handoff ``A`` to ``N2`` 2. ``N1`` notices the pending change and sends an initialization message to ``N2`` @@ -445,7 +475,7 @@ structure: 4. after receiving the ready message ``N1`` marks the change as complete and shuts down ``A`` - 5. the leader sees the migration is complete and updates the partition table + 5. the ``leader`` sees the migration is complete and updates the partition table 6. all nodes eventually see the new partitioning and use ``N2`` @@ -457,7 +487,7 @@ There are transition times in the handoff process where different approaches can be used to give different guarantees. -Migration transition +Migration Transition ~~~~~~~~~~~~~~~~~~~~ The first transition starts when ``N1`` initiates the moving of ``A`` and ends @@ -480,7 +510,7 @@ buffered until the actor is ready, or the messages are simply dropped by terminating the actor and allowing the normal dead letter process to be used. -Update transition +Update Transition ~~~~~~~~~~~~~~~~~ The second transition begins when the migration is marked as complete and ends @@ -514,12 +544,12 @@ messages sent directly to ``N2`` before the acknowledgement has been forwarded that will be buffered. -Graceful handoff +Graceful Handoff ^^^^^^^^^^^^^^^^ A more complete process for graceful handoff would be: - 1. the leader sets a pending change for ``N1`` to handoff ``A`` to ``N2`` + 1. the ``leader`` sets a pending change for ``N1`` to handoff ``A`` to ``N2`` 2. ``N1`` notices the pending change and sends an initialization message to @@ -550,7 +580,7 @@ A more complete process for graceful handoff would be: becoming dead letters) - 5. the leader sees the migration is complete and updates the partition table + 5. the ``leader`` sees the migration is complete and updates the partition table 6. all nodes eventually see the new partitioning and use ``N2`` @@ -594,7 +624,7 @@ distributed datastore. See the next section for a rough outline on how the distributed datastore could be implemented. -Implementing a Dynamo-style distributed database on top of Akka Cluster +Implementing a Dynamo-style Distributed Database on top of Akka Cluster ----------------------------------------------------------------------- The missing pieces to implement a full Dynamo-style eventually consistent data From bb0e5536bed33dbeb71c6d96a1d5434a101846f3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 7 Feb 2012 16:53:49 +0100 Subject: [PATCH 31/72] Removed cluster seed nodes, added 'join.contact-point', changed joining phase, added singleton cluster mode plus misc other changes. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/resources/reference.conf | 20 +- .../scala/akka/cluster/ClusterSettings.scala | 17 +- .../main/scala/akka/cluster/Gossiper.scala | 191 +++++++++--------- .../main/scala/akka/cluster/VectorClock.scala | 12 +- .../akka/cluster/ClusterConfigSpec.scala | 10 +- akka-docs/cluster/cluster.rst | 9 +- 6 files changed, 137 insertions(+), 122 deletions(-) diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index 3142d548b5..b3e90fac21 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -8,9 +8,18 @@ akka { cluster { - seed-nodes = [] - seed-node-connection-timeout = 30s - max-time-to-retry-joining-cluster = 30s + join { + # contact point on the form of "hostname:port" of a node to try to join + # leave as empty string if the node should be a singleton cluster + contact-point = "" + timeout = 30s + max-time-to-retry = 30s + } + + gossip { + initialDelay = 5s + frequency = 1s + } # accrual failure detection config failure-detector { @@ -24,10 +33,5 @@ akka { max-sample-size = 1000 } - - gossip { - initial-delay = 5s - frequency = 1s - } } } diff --git a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala index dc081623bc..3e709cee49 100644 --- a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala +++ b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala @@ -16,11 +16,16 @@ class ClusterSettings(val config: Config, val systemName: String) { // cluster config section val FailureDetectorThreshold = getInt("akka.cluster.failure-detector.threshold") val FailureDetectorMaxSampleSize = getInt("akka.cluster.failure-detector.max-sample-size") - val SeedNodeConnectionTimeout = Duration(config.getMilliseconds("akka.cluster.seed-node-connection-timeout"), MILLISECONDS) - val MaxTimeToRetryJoiningCluster = Duration(config.getMilliseconds("akka.cluster.max-time-to-retry-joining-cluster"), MILLISECONDS) - val InitialDelayForGossip = Duration(getMilliseconds("akka.cluster.gossip.initial-delay"), MILLISECONDS) - val GossipFrequency = Duration(getMilliseconds("akka.cluster.gossip.frequency"), MILLISECONDS) - val SeedNodes = Set.empty[Address] ++ getStringList("akka.cluster.seed-nodes").asScala.collect { - case AddressFromURIString(addr) ⇒ addr + + // join config + val JoinContactPoint: Option[Address] = getString("akka.cluster.join.contact-point") match { + case "" ⇒ None + case AddressExtractor(addr) ⇒ Some(addr) } + val JoinTimeout = Duration(config.getMilliseconds("akka.cluster.join.timeout"), MILLISECONDS) + val JoinMaxTimeToRetry = Duration(config.getMilliseconds("akka.cluster.join.max-time-to-retry"), MILLISECONDS) + + // gossip config + val GossipInitialDelay = Duration(getMilliseconds("akka.cluster.gossip.initialDelay"), MILLISECONDS) + val GossipFrequency = Duration(getMilliseconds("akka.cluster.gossip.frequency"), MILLISECONDS) } diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index bb15223842..47536ff5d2 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -32,6 +32,8 @@ trait NodeMembershipChangeListener { def memberDisconnected(member: Member) } +// FIXME create Protobuf messages out of all the Gossip stuff - but wait until the prototol is fully stablized. + /** * Base trait for all cluster messages. All ClusterMessage's are serializable. */ @@ -40,14 +42,13 @@ sealed trait ClusterMessage extends Serializable /** * Command to join the cluster. */ -case object JoinCluster extends ClusterMessage +case class Join(node: Address) extends ClusterMessage /** * Represents the state of the cluster; cluster ring membership, ring convergence, meta data - all versioned by a vector clock. */ case class Gossip( - version: VectorClock = VectorClock(), - member: Address, + member: Member, // sorted set of members with their status, sorted by name members: SortedSet[Member] = SortedSet.empty[Member](Ordering.fromLessThan[Member](_.address.toString > _.address.toString)), unavailableMembers: Set[Member] = Set.empty[Member], @@ -55,7 +56,9 @@ case class Gossip( seen: Map[Member, VectorClock] = Map.empty[Member, VectorClock], // for handoff //pendingChanges: Option[Vector[PendingPartitioningChange]] = None, - meta: Option[Map[String, Array[Byte]]] = None) + meta: Option[Map[String, Array[Byte]]] = None, + // vector clock version + version: VectorClock = VectorClock()) extends ClusterMessage // is a serializable cluster message with Versioned // has a vector clock as version @@ -69,13 +72,13 @@ case class Member(address: Address, status: MemberStatus) extends ClusterMessage * * Can be one of: Joining, Up, Leaving, Exiting and Down. */ -sealed trait MemberStatus extends ClusterMessage with Versioned +sealed trait MemberStatus extends ClusterMessage object MemberStatus { - case class Joining(version: VectorClock = VectorClock()) extends MemberStatus - case class Up(version: VectorClock = VectorClock()) extends MemberStatus - case class Leaving(version: VectorClock = VectorClock()) extends MemberStatus - case class Exiting(version: VectorClock = VectorClock()) extends MemberStatus - case class Down(version: VectorClock = VectorClock()) extends MemberStatus + case object Joining extends MemberStatus + case object Up extends MemberStatus + case object Leaving extends MemberStatus + case object Exiting extends MemberStatus + case object Down extends MemberStatus } // sealed trait PendingPartitioningStatus @@ -94,11 +97,9 @@ final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor val log = Logging(system, "ClusterDaemon") def receive = { - case JoinCluster ⇒ sender ! gossiper.latestGossip - case gossip: Gossip ⇒ - gossiper.tell(gossip) - - case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") + case Join(address) ⇒ sender ! gossiper.latestGossip // TODO use address in Join(address) ? + case gossip: Gossip ⇒ gossiper.tell(gossip) + case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } @@ -113,8 +114,8 @@ final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor *

  *   1) Gossip to random live member (if any)
  *   2) Gossip to random unreachable member with certain probability depending on number of unreachable and live members
- *   3) If the member gossiped to at (1) was not seed, or the number of live members is less than number of seeds,
- *       gossip to random seed with certain probability depending on number of unreachable, seed and live members.
+ *   3) If the member gossiped to at (1) was not deputy, or the number of live members is less than number of deputy list,
+ *       gossip to random deputy with certain probability depending on number of unreachable, deputy and live members.
  * 
*/ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { @@ -132,22 +133,20 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { val protocol = "akka" // TODO should this be hardcoded? val address = remote.transport.address - val memberFingerprint = address.## - val initialDelayForGossip = clusterSettings.InitialDelayForGossip + + val gossipInitialDelay = clusterSettings.GossipInitialDelay val gossipFrequency = clusterSettings.GossipFrequency - implicit val seedNodeConnectionTimeout = clusterSettings.SeedNodeConnectionTimeout + + implicit val joinTimeout = clusterSettings.JoinTimeout implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) - // seed members - private val seeds: Set[Member] = { - if (clusterSettings.SeedNodes.isEmpty) throw new ConfigurationException( - "At least one seed member must be defined in the configuration [akka.cluster.seed-members]") - else clusterSettings.SeedNodes map (address ⇒ Member(address, MemberStatus.Up())) - } + private val contactPoint: Option[Member] = + clusterSettings.JoinContactPoint filter (_ != address) map (address ⇒ Member(address, MemberStatus.Up)) private val serialization = remote.serialization - private val failureDetector = new AccrualFailureDetector(system, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) + private val failureDetector = new AccrualFailureDetector( + system, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) private val isRunning = new AtomicBoolean(true) private val log = Logging(system, "Gossiper") @@ -162,12 +161,12 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { log.info("Starting cluster Gossiper...") - // join the cluster by connecting to one of the seed members and retrieve current cluster state (Gossip) - joinCluster(clusterSettings.MaxTimeToRetryJoiningCluster fromNow) + // join the cluster by connecting to one of the deputy members and retrieve current cluster state (Gossip) + joinContactPoint(clusterSettings.JoinMaxTimeToRetry fromNow) // start periodic gossip and cluster scrutinization - val initateGossipCanceller = system.scheduler.schedule(initialDelayForGossip, gossipFrequency)(initateGossip()) - val scrutinizeCanceller = system.scheduler.schedule(initialDelayForGossip, gossipFrequency)(scrutinize()) + val initateGossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency)(initateGossip()) + val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency)(scrutinize()) /** * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. @@ -196,7 +195,7 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { final def tell(newGossip: Gossip) { val gossipingNode = newGossip.member - failureDetector heartbeat gossipingNode // update heartbeat in failure detector + failureDetector heartbeat gossipingNode.address // update heartbeat in failure detector // FIXME all below here is WRONG - redesign with cluster convergence in mind @@ -224,7 +223,7 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { // println("---------- WON RACE - setting state") // // create connections for all new members in the latest gossip // (latestAvailableNodes + gossipingNode) foreach { member ⇒ - // setUpConnectionToNode(member) + // setUpConnectionTo(member) // oldState.memberMembershipChangeListeners foreach (_ memberConnected member) // notify listeners about the new members // } // } @@ -267,69 +266,43 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { } /** - * Sets up remote connections to all the members in the argument list. + * Joins the pre-configured contact point and retrieves current gossip state. */ - private def connectToNodes(members: Seq[Member]) { - members foreach { member ⇒ - setUpConnectionToNode(member) - state.get.memberMembershipChangeListeners foreach (_ memberConnected member) // notify listeners about the new members - } - } - - // FIXME should shuffle list randomly before start traversing to avoid connecting to some member on every member - @tailrec - final private def connectToRandomNodeOf(members: Seq[Member]): ActorRef = { - members match { - case member :: rest ⇒ - setUpConnectionToNode(member) match { - case Some(connection) ⇒ connection - case None ⇒ connectToRandomNodeOf(rest) // recur if - } - case Nil ⇒ - throw new RemoteConnectionException( - "Could not establish connection to any of the members in the argument list") - } - } - - /** - * Joins the cluster by connecting to one of the seed members and retrieve current cluster state (Gossip). - */ - private def joinCluster(deadline: Deadline) { - val seedNodes = seedNodesWithoutMyself // filter out myself - - if (!seedNodes.isEmpty) { // if we have seed members to contact - connectToNodes(seedNodes) - + private def joinContactPoint(deadline: Deadline) { + def tryJoinContactPoint(connection: ActorRef, deadline: Deadline) { try { - log.info("Trying to join cluster through one of the seed members [{}]", seedNodes.mkString(", ")) - - Await.result(connectToRandomNodeOf(seedNodes) ? JoinCluster, seedNodeConnectionTimeout) match { + Await.result(connection ? Join(address), joinTimeout) match { case initialGossip: Gossip ⇒ // just sets/overwrites the state/gossip regardless of what it was before // since it should be treated as the initial state state.set(state.get copy (currentGossip = initialGossip)) - log.debug("Received initial gossip [{}] from seed member", initialGossip) + log.debug("Received initial gossip [{}]", initialGossip) case unknown ⇒ - throw new IllegalStateException("Expected initial gossip from seed, received [" + unknown + "]") + throw new IllegalStateException("Expected initial gossip but received [" + unknown + "]") } } catch { case e: Exception ⇒ - log.error( - "Could not join cluster through any of the seed members - retrying for another {} seconds", - deadline.timeLeft.toSeconds) + log.error("Could not join contact point node - retrying for another {} seconds", deadline.timeLeft.toSeconds) // retry joining the cluster unless // 1. Gossiper is shut down // 2. The connection time window has expired - if (isRunning.get) { - if (deadline.timeLeft.toMillis > 0) joinCluster(deadline) // recur - else throw new RemoteConnectionException( - "Could not join cluster (any of the seed members) - giving up after trying for " + - deadline.time.toSeconds + " seconds") - } + if (isRunning.get && deadline.timeLeft.toMillis > 0) tryJoinContactPoint(connection, deadline) // recur + else throw new RemoteConnectionException( + "Could not join contact point node - giving up after trying for " + deadline.time.toSeconds + " seconds") } } + + contactPoint match { + case None ⇒ log.info("Booting up in singleton cluster mode") + case Some(member) ⇒ + log.info("Trying to join contact point node defined in the configuration [{}]", member) + setUpConnectionTo(member) match { + case None ⇒ log.error("Could not set up connection to join contact point node defined in the configuration [{}]", member) + case Some(connection) ⇒ tryJoinContactPoint(connection, deadline) + } + } } /** @@ -346,7 +319,7 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { val oldUnavailableMembersSize = oldUnavailableMembers.size // 1. gossip to alive members - val gossipedToSeed = + val shouldGossipToDeputy = if (oldUnavailableMembersSize > 0) gossipToRandomNodeOf(oldMembers) else false @@ -356,12 +329,13 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnavailableMembers) } - // 3. gossip to a seed for facilitating partition healing - if ((!gossipedToSeed || oldMembersSize < 1) && (seeds.head != address)) { - if (oldMembersSize == 0) gossipToRandomNodeOf(seeds) + // 3. gossip to a deputy nodes for facilitating partition healing + val deputies = deputyNodesWithoutMyself + if ((!shouldGossipToDeputy || oldMembersSize < 1) && (deputies.head != address)) { + if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) else { val probability = 1.0 / oldMembersSize + oldUnavailableMembersSize - if (random.nextDouble() <= probability) gossipToRandomNodeOf(seeds) + if (random.nextDouble() <= probability) gossipToRandomNodeOf(deputies) } } } @@ -369,18 +343,25 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { /** * Gossips to a random member in the set of members passed in as argument. * - * @return 'true' if it gossiped to a "seed" member. + * @return 'true' if it gossiped to a "deputy" member. */ - private def gossipToRandomNodeOf(members: Set[Member]): Boolean = { + private def gossipToRandomNodeOf(members: Seq[Member]): Boolean = { val peers = members filter (_.address != address) // filter out myself val peer = selectRandomNode(peers) val oldState = state.get val oldGossip = oldState.currentGossip // if connection can't be established/found => ignore it since the failure detector will take care of the potential problem - setUpConnectionToNode(peer) foreach { _ ! newGossip } - seeds exists (peer == _) + setUpConnectionTo(peer) foreach { _ ! newGossip } + deputyNodesWithoutMyself exists (peer == _) } + /** + * Gossips to a random member in the set of members passed in as argument. + * + * @return 'true' if it gossiped to a "deputy" member. + */ + private def gossipToRandomNodeOf(members: Set[Member]): Boolean = gossipToRandomNodeOf(members.toList) + /** * Scrutinizes the cluster; marks members detected by the failure detector as unavailable, and notifies all listeners * of the change in the cluster membership. @@ -413,7 +394,30 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { } } - private def setUpConnectionToNode(member: Member): Option[ActorRef] = { + // FIXME should shuffle list randomly before start traversing to avoid connecting to some member on every member + @tailrec + final private def connectToRandomNodeOf(members: Seq[Member]): ActorRef = { + members match { + case member :: rest ⇒ + setUpConnectionTo(member) match { + case Some(connection) ⇒ connection + case None ⇒ connectToRandomNodeOf(rest) // recur if + } + case Nil ⇒ + throw new RemoteConnectionException( + "Could not establish connection to any of the members in the argument list") + } + } + + /** + * Sets up remote connections to all the members in the argument list. + */ + private def setUpConnectionsTo(members: Seq[Member]): Seq[Option[ActorRef]] = members map { setUpConnectionTo(_) } + + /** + * Sets up remote connection. + */ + private def setUpConnectionTo(member: Member): Option[ActorRef] = { val address = member.address try { Some( @@ -425,14 +429,13 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { } } - private def newGossip(): Gossip = Gossip(member = address) + private def newGossip(): Gossip = Gossip(Member(address, MemberStatus.Joining)) // starts in Joining mode private def incrementVersionForGossip(from: Gossip): Gossip = { - val newVersion = from.version.increment(memberFingerprint, newTimestamp) - from copy (version = newVersion) + from copy (version = from.version.increment(memberFingerprint, newTimestamp)) } - private def seedNodesWithoutMyself: List[Member] = seeds.filter(_.address != address).toList + private def deputyNodesWithoutMyself: Seq[Member] = Seq.empty[Member] filter (_.address != address) // FIXME read in deputy nodes from gossip data - now empty seq - private def selectRandomNode(members: Set[Member]): Member = members.toList(random.nextInt(members.size)) + private def selectRandomNode(members: Seq[Member]): Member = members(random.nextInt(members.size)) } diff --git a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala index ef1f1be490..d8d87db75b 100644 --- a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala +++ b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala @@ -30,11 +30,11 @@ object Versioned { /** * Representation of a Vector-based clock (counting clock), inspired by Lamport logical clocks. - * {{ + * {{{ * Reference: * 1) Leslie Lamport (1978). "Time, clocks, and the ordering of events in a distributed system". Communications of the ACM 21 (7): 558-565. * 2) Friedemann Mattern (1988). "Virtual Time and Global States of Distributed Systems". Workshop on Parallel and Distributed Algorithms: pp. 215-226 - * }} + * }}} */ case class VectorClock( versions: Vector[VectorClock.Entry] = Vector.empty[VectorClock.Entry], @@ -76,11 +76,11 @@ object VectorClock { /** * The result of comparing two vector clocks. * Either: - * {{ + * {{{ * 1) v1 is BEFORE v2 * 2) v1 is AFTER t2 * 3) v1 happens CONCURRENTLY to v2 - * }} + * }}} */ sealed trait Ordering case object Before extends Ordering @@ -97,11 +97,11 @@ object VectorClock { /** * Compare two vector clocks. The outcomes will be one of the following: *

- * {{ + * {{{ * 1. Clock 1 is BEFORE clock 2 if there exists an i such that c1(i) <= c(2) and there does not exist a j such that c1(j) > c2(j). * 2. Clock 1 is CONCURRENT to clock 2 if there exists an i, j such that c1(i) < c2(i) and c1(j) > c2(j). * 3. Clock 1 is AFTER clock 2 otherwise. - * }} + * }}} * * @param v1 The first VectorClock * @param v2 The second VectorClock diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 240d1ad3ff..7f1b26e553 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -25,11 +25,13 @@ class ClusterConfigSpec extends AkkaSpec( import settings._ FailureDetectorThreshold must be(8) FailureDetectorMaxSampleSize must be(1000) - SeedNodeConnectionTimeout must be(30 seconds) - MaxTimeToRetryJoiningCluster must be(30 seconds) - InitialDelayForGossip must be(5 seconds) + + JoinContactPoint must be(None) + JoinTimeout must be(30 seconds) + JoinMaxTimeToRetry must be(30 seconds) + + GossipInitialDelay must be(5 seconds) GossipFrequency must be(1 second) - SeedNodes must be(Set()) } } } diff --git a/akka-docs/cluster/cluster.rst b/akka-docs/cluster/cluster.rst index 4cd336304d..5047d8c16c 100644 --- a/akka-docs/cluster/cluster.rst +++ b/akka-docs/cluster/cluster.rst @@ -74,12 +74,13 @@ each node hosting some part of the application. Cluster membership and partitioning of the application are decoupled. A node could be a member of a cluster without hosting any actors. -Single-node Cluster -------------------- + +Singleton Cluster +----------------- If a node does not have a preconfigured contact point to join in the Akka -configuration, then it is considered a single-node cluster and will -automatically transition from ``joining`` to ``up``. Single-node clusters +configuration, then it is considered a singleton cluster (single node cluster) +and will automatically transition from ``joining`` to ``up``. Singleton clusters can later explicitly send a ``Join`` message to another node to form a N-node cluster. It is also possible to link multiple N-node clusters by ``joining`` them. From 75c1b5717c20af79e205099d67c4ca3f0b3949da Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 8 Feb 2012 14:14:01 +0100 Subject: [PATCH 32/72] Completed singleton and N-node cluster boot up and joining phase. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Simplified node join phase. * Added tests for cluster node startup and joining, both for singleton cluster and 2-node cluster. * Fixed bug in cluster node address and cluster daemon lookup. * Changed some APIs. * Renamed 'contact-point' to 'node-to-join'. * Minor refactorings. Signed-off-by: Jonas Bonér --- .../src/main/resources/reference.conf | 10 +- .../scala/akka/cluster/ClusterSettings.scala | 9 +- .../main/scala/akka/cluster/Gossiper.scala | 201 ++++++++++-------- .../akka/cluster/ClusterConfigSpec.scala | 6 +- .../scala/akka/cluster/NodeStartupSpec.scala | 90 ++++++++ .../scala/akka/remote/RemoteAddress.scala | 5 - 6 files changed, 202 insertions(+), 119 deletions(-) create mode 100644 akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala delete mode 100644 akka-remote/src/main/scala/akka/remote/RemoteAddress.scala diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index b3e90fac21..3df6dd3774 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -8,13 +8,9 @@ akka { cluster { - join { - # contact point on the form of "hostname:port" of a node to try to join - # leave as empty string if the node should be a singleton cluster - contact-point = "" - timeout = 30s - max-time-to-retry = 30s - } + # node to join - the full URI defined by a string on the form of "akka://system@hostname:port" + # leave as empty string if the node should be a singleton cluster + node-to-join = "" gossip { initialDelay = 5s diff --git a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala index 3e709cee49..f4d57bf1f6 100644 --- a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala +++ b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala @@ -13,19 +13,12 @@ import akka.actor.AddressFromURIString class ClusterSettings(val config: Config, val systemName: String) { import config._ - // cluster config section val FailureDetectorThreshold = getInt("akka.cluster.failure-detector.threshold") val FailureDetectorMaxSampleSize = getInt("akka.cluster.failure-detector.max-sample-size") - - // join config - val JoinContactPoint: Option[Address] = getString("akka.cluster.join.contact-point") match { + val NodeToJoin: Option[Address] = getString("akka.cluster.node-to-join") match { case "" ⇒ None case AddressExtractor(addr) ⇒ Some(addr) } - val JoinTimeout = Duration(config.getMilliseconds("akka.cluster.join.timeout"), MILLISECONDS) - val JoinMaxTimeToRetry = Duration(config.getMilliseconds("akka.cluster.join.max-time-to-retry"), MILLISECONDS) - - // gossip config val GossipInitialDelay = Duration(getMilliseconds("akka.cluster.gossip.initialDelay"), MILLISECONDS) val GossipFrequency = Duration(getMilliseconds("akka.cluster.gossip.frequency"), MILLISECONDS) } diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 47536ff5d2..b134a9c54c 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -48,9 +48,9 @@ case class Join(node: Address) extends ClusterMessage * Represents the state of the cluster; cluster ring membership, ring convergence, meta data - all versioned by a vector clock. */ case class Gossip( - member: Member, + self: Member, // sorted set of members with their status, sorted by name - members: SortedSet[Member] = SortedSet.empty[Member](Ordering.fromLessThan[Member](_.address.toString > _.address.toString)), + members: SortedSet[Member], unavailableMembers: Set[Member] = Set.empty[Member], // for ring convergence seen: Map[Member, VectorClock] = Map.empty[Member, VectorClock], @@ -97,8 +97,8 @@ final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor val log = Logging(system, "ClusterDaemon") def receive = { - case Join(address) ⇒ sender ! gossiper.latestGossip // TODO use address in Join(address) ? - case gossip: Gossip ⇒ gossiper.tell(gossip) + case Join(address) ⇒ gossiper.joining(address) + case gossip: Gossip ⇒ gossiper.receive(gossip) case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } @@ -118,31 +118,30 @@ final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor * gossip to random deputy with certain probability depending on number of unreachable, deputy and live members. * */ -case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { +case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Represents the state for this Gossiper. Implemented using optimistic lockless concurrency, * all state is represented by this immutable case class and managed by an AtomicReference. */ private case class State( - currentGossip: Gossip, + latestGossip: Gossip, + isSingletonCluster: Boolean = true, // starts as singleton cluster memberMembershipChangeListeners: Set[NodeMembershipChangeListener] = Set.empty[NodeMembershipChangeListener]) val remoteSettings = new RemoteSettings(system.settings.config, system.name) val clusterSettings = new ClusterSettings(system.settings.config, system.name) - val protocol = "akka" // TODO should this be hardcoded? - val address = remote.transport.address - val memberFingerprint = address.## + val remoteAddress = remote.transport.address + val memberFingerprint = remoteAddress.## val gossipInitialDelay = clusterSettings.GossipInitialDelay val gossipFrequency = clusterSettings.GossipFrequency - implicit val joinTimeout = clusterSettings.JoinTimeout implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) - private val contactPoint: Option[Member] = - clusterSettings.JoinContactPoint filter (_ != address) map (address ⇒ Member(address, MemberStatus.Up)) + private val nodeToJoin: Option[Member] = + clusterSettings.NodeToJoin filter (_ != remoteAddress) map (address ⇒ Member(address, MemberStatus.Joining)) private val serialization = remote.serialization private val failureDetector = new AccrualFailureDetector( @@ -154,31 +153,42 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { // Is it right to put this guy under the /system path or should we have a top-level /cluster or something else...? private val clusterDaemon = system.systemActorOf(Props(new ClusterDaemon(system, this)), "cluster") - private val state = new AtomicReference[State](State(currentGossip = newGossip())) + + private val state = { + val member = Member(remoteAddress, MemberStatus.Joining) + val gossip = Gossip( + self = member, + members = SortedSet.empty[Member](Ordering.fromLessThan[Member](_.address.toString > _.address.toString)) + member) // add joining node as Joining + new AtomicReference[State](State(gossip)) + } // FIXME manage connections in some other way so we can delete the RemoteConnectionManager (SINCE IT SUCKS!!!) private val connectionManager = new RemoteConnectionManager(system, remote, failureDetector, Map.empty[Address, ActorRef]) - log.info("Starting cluster Gossiper...") + log.info("Node [{}] - Starting cluster Gossiper...", remoteAddress) - // join the cluster by connecting to one of the deputy members and retrieve current cluster state (Gossip) - joinContactPoint(clusterSettings.JoinMaxTimeToRetry fromNow) + // try to join the node defined in the 'akka.cluster.node-to-join' option + join() // start periodic gossip and cluster scrutinization - val initateGossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency)(initateGossip()) - val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency)(scrutinize()) + val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { + gossip() + } + val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { + scrutinize() + } /** * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. */ def shutdown() { if (isRunning.compareAndSet(true, false)) { - log.info("Shutting down Gossiper for [{}]...", address) + log.info("Node [{}] - Shutting down Gossiper", remoteAddress) try connectionManager.shutdown() finally { try system.stop(clusterDaemon) finally { - try initateGossipCanceller.cancel() finally { + try gossipCanceller.cancel() finally { try scrutinizeCanceller.cancel() finally { - log.info("Gossiper for [{}] is shut down", address) + log.info("Node [{}] - Gossiper is shut down", remoteAddress) } } } @@ -186,60 +196,90 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { } } - def latestGossip: Gossip = state.get.currentGossip + /** + * Latest gossip. + */ + def latestGossip: Gossip = state.get.latestGossip /** - * Tell the gossiper some gossip. + * Member status for this node. + */ + def self: Member = latestGossip.self + + /** + * Is this node a singleton cluster? + */ + def isSingletonCluster: Boolean = state.get.isSingletonCluster + + /** + * New node joining. + */ + @tailrec + final def joining(node: Address) { + log.debug("Node [{}] - Node [{}] is joining", remoteAddress, node) + val oldState = state.get + val oldGossip = oldState.latestGossip + val oldMembers = oldGossip.members + val newGossip = oldGossip copy (members = oldMembers + Member(node, MemberStatus.Joining)) // add joining node as Joining + val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + if (!state.compareAndSet(oldState, newState)) joining(node) // recur if we failed update + } + + /** + * Receive new gossip. */ //@tailrec - final def tell(newGossip: Gossip) { - val gossipingNode = newGossip.member + final def receive(newGossip: Gossip) { + val from = newGossip.self + log.debug("Node [{}] - Receiving gossip from [{}]", remoteAddress, from.address) - failureDetector heartbeat gossipingNode.address // update heartbeat in failure detector + failureDetector heartbeat from.address // update heartbeat in failure detector + + // FIXME set flag state.isSingletonCluster = false (if true) // FIXME all below here is WRONG - redesign with cluster convergence in mind // val oldState = state.get // println("-------- NEW VERSION " + newGossip) - // println("-------- OLD VERSION " + oldState.currentGossip) - // val latestGossip = VectorClock.latestVersionOf(newGossip, oldState.currentGossip) - // println("-------- WINNING VERSION " + latestGossip) + // println("-------- OLD VERSION " + oldState.latestGossip) + // val gossip = VectorClock.latestVersionOf(newGossip, oldState.latestGossip) + // println("-------- WINNING VERSION " + gossip) - // val latestAvailableNodes = latestGossip.members - // val latestUnavailableNodes = latestGossip.unavailableMembers - // println("=======>>> gossipingNode: " + gossipingNode) + // val latestAvailableNodes = gossip.members + // val latestUnavailableNodes = gossip.unavailableMembers + // println("=======>>> myself: " + myself) // println("=======>>> latestAvailableNodes: " + latestAvailableNodes) - // if (!(latestAvailableNodes contains gossipingNode) && !(latestUnavailableNodes contains gossipingNode)) { + // if (!(latestAvailableNodes contains myself) && !(latestUnavailableNodes contains myself)) { // println("-------- NEW NODE") // // we have a new member - // val newGossip = latestGossip copy (availableNodes = latestAvailableNodes + gossipingNode) - // val newState = oldState copy (currentGossip = incrementVersionForGossip(newGossip)) + // val newGossip = gossip copy (availableNodes = latestAvailableNodes + myself) + // val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) // println("--------- new GOSSIP " + newGossip.members) // println("--------- new STATE " + newState) // // if we won the race then update else try again - // if (!state.compareAndSet(oldState, newState)) tell(newGossip) // recur + // if (!state.compareAndSet(oldState, newState)) receive(newGossip) // recur // else { // println("---------- WON RACE - setting state") // // create connections for all new members in the latest gossip - // (latestAvailableNodes + gossipingNode) foreach { member ⇒ + // (latestAvailableNodes + myself) foreach { member ⇒ // setUpConnectionTo(member) // oldState.memberMembershipChangeListeners foreach (_ memberConnected member) // notify listeners about the new members // } // } - // } else if (latestUnavailableNodes contains gossipingNode) { + // } else if (latestUnavailableNodes contains myself) { // // gossip from an old former dead member - // val newUnavailableMembers = latestUnavailableNodes - gossipingNode - // val newMembers = latestAvailableNodes + gossipingNode + // val newUnavailableMembers = latestUnavailableNodes - myself + // val newMembers = latestAvailableNodes + myself - // val newGossip = latestGossip copy (availableNodes = newMembers, unavailableNodes = newUnavailableMembers) - // val newState = oldState copy (currentGossip = incrementVersionForGossip(newGossip)) + // val newGossip = gossip copy (availableNodes = newMembers, unavailableNodes = newUnavailableMembers) + // val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) // // if we won the race then update else try again - // if (!state.compareAndSet(oldState, newState)) tell(newGossip) // recur - // else oldState.memberMembershipChangeListeners foreach (_ memberConnected gossipingNode) // notify listeners on successful update of state + // if (!state.compareAndSet(oldState, newState)) receive(newGossip) // recur + // else oldState.memberMembershipChangeListeners foreach (_ memberConnected myself) // notify listeners on successful update of state // } } @@ -268,49 +308,20 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { /** * Joins the pre-configured contact point and retrieves current gossip state. */ - private def joinContactPoint(deadline: Deadline) { - def tryJoinContactPoint(connection: ActorRef, deadline: Deadline) { - try { - Await.result(connection ? Join(address), joinTimeout) match { - case initialGossip: Gossip ⇒ - // just sets/overwrites the state/gossip regardless of what it was before - // since it should be treated as the initial state - state.set(state.get copy (currentGossip = initialGossip)) - log.debug("Received initial gossip [{}]", initialGossip) - - case unknown ⇒ - throw new IllegalStateException("Expected initial gossip but received [" + unknown + "]") - } - } catch { - case e: Exception ⇒ - log.error("Could not join contact point node - retrying for another {} seconds", deadline.timeLeft.toSeconds) - - // retry joining the cluster unless - // 1. Gossiper is shut down - // 2. The connection time window has expired - if (isRunning.get && deadline.timeLeft.toMillis > 0) tryJoinContactPoint(connection, deadline) // recur - else throw new RemoteConnectionException( - "Could not join contact point node - giving up after trying for " + deadline.time.toSeconds + " seconds") - } - } - - contactPoint match { - case None ⇒ log.info("Booting up in singleton cluster mode") - case Some(member) ⇒ - log.info("Trying to join contact point node defined in the configuration [{}]", member) - setUpConnectionTo(member) match { - case None ⇒ log.error("Could not set up connection to join contact point node defined in the configuration [{}]", member) - case Some(connection) ⇒ tryJoinContactPoint(connection, deadline) - } + private def join() = nodeToJoin foreach { member ⇒ + setUpConnectionTo(member) foreach { connection ⇒ + val command = Join(remoteAddress) + log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, member.address, connection) + connection ! command } } /** * Initates a new round of gossip. */ - private def initateGossip() { + private def gossip() { val oldState = state.get - val oldGossip = oldState.currentGossip + val oldGossip = oldState.latestGossip val oldMembers = oldGossip.members val oldMembersSize = oldMembers.size @@ -331,7 +342,7 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { // 3. gossip to a deputy nodes for facilitating partition healing val deputies = deputyNodesWithoutMyself - if ((!shouldGossipToDeputy || oldMembersSize < 1) && (deputies.head != address)) { + if ((!shouldGossipToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) else { val probability = 1.0 / oldMembersSize + oldUnavailableMembersSize @@ -341,17 +352,24 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { } /** - * Gossips to a random member in the set of members passed in as argument. + * Gossips latest gossip to a member. + */ + private def gossipTo(member: Member) { + setUpConnectionTo(member) foreach { _ ! latestGossip } + } + + /** + * Gossips latest gossip to a random member in the set of members passed in as argument. * * @return 'true' if it gossiped to a "deputy" member. */ private def gossipToRandomNodeOf(members: Seq[Member]): Boolean = { - val peers = members filter (_.address != address) // filter out myself + val peers = members filter (_.address != remoteAddress) // filter out myself val peer = selectRandomNode(peers) val oldState = state.get - val oldGossip = oldState.currentGossip + val oldGossip = oldState.latestGossip // if connection can't be established/found => ignore it since the failure detector will take care of the potential problem - setUpConnectionTo(peer) foreach { _ ! newGossip } + gossipTo(peer) deputyNodesWithoutMyself exists (peer == _) } @@ -369,7 +387,7 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { @tailrec final private def scrutinize() { val oldState = state.get - val oldGossip = oldState.currentGossip + val oldGossip = oldState.latestGossip val oldMembers = oldGossip.members val oldUnavailableMembers = oldGossip.unavailableMembers @@ -380,7 +398,7 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { val newUnavailableMembers = oldUnavailableMembers ++ newlyDetectedUnavailableMembers val newGossip = oldGossip copy (members = newMembers, unavailableMembers = newUnavailableMembers) - val newState = oldState copy (currentGossip = incrementVersionForGossip(newGossip)) + val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) // if we won the race then update else try again if (!state.compareAndSet(oldState, newState)) scrutinize() // recur @@ -420,22 +438,17 @@ case class Gossiper(remote: RemoteActorRefProvider, system: ActorSystemImpl) { private def setUpConnectionTo(member: Member): Option[ActorRef] = { val address = member.address try { - Some( - connectionManager.putIfAbsent( - address, - () ⇒ system.actorFor(RootActorPath(Address(protocol, system.name)) / "system" / "cluster"))) + Some(connectionManager.putIfAbsent(address, () ⇒ system.actorFor(RootActorPath(address) / "system" / "cluster"))) } catch { case e: Exception ⇒ None } } - private def newGossip(): Gossip = Gossip(Member(address, MemberStatus.Joining)) // starts in Joining mode - private def incrementVersionForGossip(from: Gossip): Gossip = { from copy (version = from.version.increment(memberFingerprint, newTimestamp)) } - private def deputyNodesWithoutMyself: Seq[Member] = Seq.empty[Member] filter (_.address != address) // FIXME read in deputy nodes from gossip data - now empty seq + private def deputyNodesWithoutMyself: Seq[Member] = Seq.empty[Member] filter (_.address != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq private def selectRandomNode(members: Seq[Member]): Member = members(random.nextInt(members.size)) } diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 7f1b26e553..78c836f0b5 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -25,11 +25,7 @@ class ClusterConfigSpec extends AkkaSpec( import settings._ FailureDetectorThreshold must be(8) FailureDetectorMaxSampleSize must be(1000) - - JoinContactPoint must be(None) - JoinTimeout must be(30 seconds) - JoinMaxTimeToRetry must be(30 seconds) - + NodeToJoin must be(None) GossipInitialDelay must be(5 seconds) GossipFrequency must be(1 second) } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala new file mode 100644 index 0000000000..4f07650f62 --- /dev/null +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -0,0 +1,90 @@ +/** + * Copyright (C) 2009-2011 Typesafe Inc. + */ +package akka.cluster + +import java.net.InetSocketAddress + +import akka.testkit._ +import akka.dispatch._ +import akka.actor._ +import akka.remote._ + +import com.typesafe.config._ + +class NodeStartupSpec extends AkkaSpec(""" + akka { + loglevel = "DEBUG" + } + """) with ImplicitSender { + + var gossiper0: Gossiper = _ + var gossiper1: Gossiper = _ + var node0: ActorSystemImpl = _ + var node1: ActorSystemImpl = _ + + try { + node0 = ActorSystem("NodeStartupSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] + gossiper0 = Gossiper(node0, remote0) + + "A first cluster node with a 'node-to-join' config set to empty string" must { + "be in 'Joining' phase when started up" in { + val members = gossiper0.latestGossip.members + val joiningMember = members find (_.address.port.get == 5550) + joiningMember must be('defined) + joiningMember.get.status must be(MemberStatus.Joining) + } + + "be a singleton cluster when started up" in { + gossiper0.isSingletonCluster must be(true) + } + } + + node1 = ActorSystem("NodeStartupSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5551 + } + cluster.node-to-join = "akka://NodeStartupSpec@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] + gossiper1 = Gossiper(node1, remote1) + + "A second cluster node with a 'node-to-join' config defined" must { + "join the other node cluster as 'Joining' when sending a Join command" in { + Thread.sleep(1000) // give enough time for node1 to JOIN node0 + val members = gossiper0.latestGossip.members + val joiningMember = members find (_.address.port.get == 5551) + joiningMember must be('defined) + joiningMember.get.status must be(MemberStatus.Joining) + } + } + } catch { + case e: Exception ⇒ + e.printStackTrace + fail(e.toString) + } + + override def atTermination() { + gossiper0.shutdown() + node0.shutdown() + gossiper1.shutdown() + node1.shutdown() + } +} diff --git a/akka-remote/src/main/scala/akka/remote/RemoteAddress.scala b/akka-remote/src/main/scala/akka/remote/RemoteAddress.scala deleted file mode 100644 index f7274c2356..0000000000 --- a/akka-remote/src/main/scala/akka/remote/RemoteAddress.scala +++ /dev/null @@ -1,5 +0,0 @@ -/** - * Copyright (C) 2009-2012 Typesafe Inc. - */ -package akka.remote - From bf7c30742463f4ac9003af83983550a8cafe86df Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 7 Feb 2012 14:20:51 +0100 Subject: [PATCH 33/72] Changes to cluster specification. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Added section on single-node cluster. - Changed seed nodes to deputy nodes. - Seed nodes are no longer used as contact points only to break logical partitions. Signed-off-by: Jonas Bonér --- akka-docs/cluster/cluster.rst | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/akka-docs/cluster/cluster.rst b/akka-docs/cluster/cluster.rst index 5047d8c16c..8d22faeae5 100644 --- a/akka-docs/cluster/cluster.rst +++ b/akka-docs/cluster/cluster.rst @@ -74,6 +74,15 @@ each node hosting some part of the application. Cluster membership and partitioning of the application are decoupled. A node could be a member of a cluster without hosting any actors. +Single-node Cluster +------------------- + +If a node does not have a preconfigured contact point to join in the Akka +configuration, then it is considered a single-node cluster and will +automatically transition from ``joining`` to ``up``. Single-node clusters +can later explicitly send a ``Join`` message to another node to form a N-node +cluster. It is also possible to link multiple N-node clusters by ``joining`` them. + Singleton Cluster ----------------- From 3b5c5e5f0f1e92a8ec676741971d37344e6ed18f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 7 Feb 2012 16:53:49 +0100 Subject: [PATCH 34/72] Removed cluster seed nodes, added 'join.contact-point', changed joining phase, added singleton cluster mode plus misc other changes. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- akka-cluster/src/main/resources/reference.conf | 10 ++++++++++ .../src/main/scala/akka/cluster/Gossiper.scala | 10 ++++++++++ .../test/scala/akka/cluster/ClusterConfigSpec.scala | 6 +++++- akka-docs/cluster/cluster.rst | 9 +++++---- 4 files changed, 30 insertions(+), 5 deletions(-) diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index 3df6dd3774..ee2f2b23e8 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -8,9 +8,19 @@ akka { cluster { +<<<<<<< HEAD # node to join - the full URI defined by a string on the form of "akka://system@hostname:port" # leave as empty string if the node should be a singleton cluster node-to-join = "" +======= + join { + # contact point on the form of "hostname:port" of a node to try to join + # leave as empty string if the node should be a singleton cluster + contact-point = "" + timeout = 30s + max-time-to-retry = 30s + } +>>>>>>> Removed cluster seed nodes, added 'join.contact-point', changed joining phase, added singleton cluster mode plus misc other changes. gossip { initialDelay = 5s diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index b134a9c54c..a082f29d7c 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -314,6 +314,16 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, member.address, connection) connection ! command } + + contactPoint match { + case None ⇒ log.info("Booting up in singleton cluster mode") + case Some(member) ⇒ + log.info("Trying to join contact point node defined in the configuration [{}]", member) + setUpConnectionTo(member) match { + case None ⇒ log.error("Could not set up connection to join contact point node defined in the configuration [{}]", member) + case Some(connection) ⇒ tryJoinContactPoint(connection, deadline) + } + } } /** diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 78c836f0b5..7f1b26e553 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -25,7 +25,11 @@ class ClusterConfigSpec extends AkkaSpec( import settings._ FailureDetectorThreshold must be(8) FailureDetectorMaxSampleSize must be(1000) - NodeToJoin must be(None) + + JoinContactPoint must be(None) + JoinTimeout must be(30 seconds) + JoinMaxTimeToRetry must be(30 seconds) + GossipInitialDelay must be(5 seconds) GossipFrequency must be(1 second) } diff --git a/akka-docs/cluster/cluster.rst b/akka-docs/cluster/cluster.rst index 8d22faeae5..05994c99ee 100644 --- a/akka-docs/cluster/cluster.rst +++ b/akka-docs/cluster/cluster.rst @@ -74,12 +74,13 @@ each node hosting some part of the application. Cluster membership and partitioning of the application are decoupled. A node could be a member of a cluster without hosting any actors. -Single-node Cluster -------------------- + +Singleton Cluster +----------------- If a node does not have a preconfigured contact point to join in the Akka -configuration, then it is considered a single-node cluster and will -automatically transition from ``joining`` to ``up``. Single-node clusters +configuration, then it is considered a singleton cluster (single node cluster) +and will automatically transition from ``joining`` to ``up``. Singleton clusters can later explicitly send a ``Join`` message to another node to form a N-node cluster. It is also possible to link multiple N-node clusters by ``joining`` them. From 0413b44c983f8f533aa5238fd3be6d528122afdf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 8 Feb 2012 14:14:01 +0100 Subject: [PATCH 35/72] Completed singleton and N-node cluster boot up and joining phase. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Simplified node join phase. * Added tests for cluster node startup and joining, both for singleton cluster and 2-node cluster. * Fixed bug in cluster node address and cluster daemon lookup. * Changed some APIs. * Renamed 'contact-point' to 'node-to-join'. * Minor refactorings. Signed-off-by: Jonas Bonér --- akka-cluster/src/main/resources/reference.conf | 10 ---------- .../test/scala/akka/cluster/ClusterConfigSpec.scala | 6 +----- 2 files changed, 1 insertion(+), 15 deletions(-) diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index ee2f2b23e8..3df6dd3774 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -8,19 +8,9 @@ akka { cluster { -<<<<<<< HEAD # node to join - the full URI defined by a string on the form of "akka://system@hostname:port" # leave as empty string if the node should be a singleton cluster node-to-join = "" -======= - join { - # contact point on the form of "hostname:port" of a node to try to join - # leave as empty string if the node should be a singleton cluster - contact-point = "" - timeout = 30s - max-time-to-retry = 30s - } ->>>>>>> Removed cluster seed nodes, added 'join.contact-point', changed joining phase, added singleton cluster mode plus misc other changes. gossip { initialDelay = 5s diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 7f1b26e553..78c836f0b5 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -25,11 +25,7 @@ class ClusterConfigSpec extends AkkaSpec( import settings._ FailureDetectorThreshold must be(8) FailureDetectorMaxSampleSize must be(1000) - - JoinContactPoint must be(None) - JoinTimeout must be(30 seconds) - JoinMaxTimeToRetry must be(30 seconds) - + NodeToJoin must be(None) GossipInitialDelay must be(5 seconds) GossipFrequency must be(1 second) } From 379e9b9219aea2f643abe4432975746b56817607 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 8 Feb 2012 15:11:06 +0100 Subject: [PATCH 36/72] Switching node status to Up if singleton cluster. Added 'switchStatusTo' method. Updated the test. Profit. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/Gossiper.scala | 38 ++++++++++++++++--- .../scala/akka/cluster/NodeStartupSpec.scala | 14 +++---- 2 files changed, 39 insertions(+), 13 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index a082f29d7c..d848347736 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -156,9 +156,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val state = { val member = Member(remoteAddress, MemberStatus.Joining) - val gossip = Gossip( - self = member, - members = SortedSet.empty[Member](Ordering.fromLessThan[Member](_.address.toString > _.address.toString)) + member) // add joining node as Joining + val gossip = Gossip(self = member, members = SortedSet.empty[Member](memberOrdering) + member) new AtomicReference[State](State(gossip)) } @@ -168,7 +166,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { log.info("Node [{}] - Starting cluster Gossiper...", remoteAddress) // try to join the node defined in the 'akka.cluster.node-to-join' option - join() + nodeToJoin match { + case None ⇒ switchStatusTo(MemberStatus.Up) // if we are singleton cluster then we are already considered to be UP + case Some(member) ⇒ join(member) + } // start periodic gossip and cluster scrutinization val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { @@ -308,10 +309,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Joins the pre-configured contact point and retrieves current gossip state. */ - private def join() = nodeToJoin foreach { member ⇒ + private def join(member: Member) { setUpConnectionTo(member) foreach { connection ⇒ val command = Join(remoteAddress) - log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, member.address, connection) + log.info("Node [{}] - Sending [{}] to [{}]", remoteAddress, command, member.address) connection ! command } @@ -361,6 +362,29 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } } + @tailrec + final private def switchStatusTo(newStatus: MemberStatus) { + log.info("Node [{}] - Switching membership status to [{}]", remoteAddress, newStatus) + val oldState = state.get + val oldGossip = oldState.latestGossip + + val oldSelf = oldGossip.self + val oldMembers = oldGossip.members + + val newSelf = oldSelf copy (status = newStatus) + + val newMembersSet = oldMembers map { member ⇒ + if (member.address == remoteAddress) newSelf + else member + } + // ugly crap to work around bug in scala colletions ('val ss: SortedSet[Member] = SortedSet.empty[Member] ++ aSet' does not compile) + val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*)(memberOrdering) + + val newGossip = oldGossip copy (self = newSelf, members = newMembersSortedSet) + val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + if (!state.compareAndSet(oldState, newState)) switchStatusTo(newStatus) // recur if we failed update + } + /** * Gossips latest gossip to a member. */ @@ -461,4 +485,6 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private def deputyNodesWithoutMyself: Seq[Member] = Seq.empty[Member] filter (_.address != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq private def selectRandomNode(members: Seq[Member]): Member = members(random.nextInt(members.size)) + + private def memberOrdering = Ordering.fromLessThan[Member](_.address.toString > _.address.toString) } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index 4f07650f62..32bf0bc6b5 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -38,16 +38,16 @@ class NodeStartupSpec extends AkkaSpec(""" val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] gossiper0 = Gossiper(node0, remote0) - "A first cluster node with a 'node-to-join' config set to empty string" must { - "be in 'Joining' phase when started up" in { + "A first cluster node with a 'node-to-join' config set to empty string (singleton cluster)" must { + "be a singleton cluster when started up" in { + gossiper0.isSingletonCluster must be(true) + } + + "be in 'Up' phase when started up" in { val members = gossiper0.latestGossip.members val joiningMember = members find (_.address.port.get == 5550) joiningMember must be('defined) - joiningMember.get.status must be(MemberStatus.Joining) - } - - "be a singleton cluster when started up" in { - gossiper0.isSingletonCluster must be(true) + joiningMember.get.status must be(MemberStatus.Up) } } From 84f63db0ae3e58b2c6e483fda285d1f35bb0c69f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 8 Feb 2012 16:15:31 +0100 Subject: [PATCH 37/72] Skips gossipping and cluster scrutinization if singleton cluster. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/Gossiper.scala | 93 ++++++++++--------- 1 file changed, 49 insertions(+), 44 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index d848347736..a74debcffe 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -138,6 +138,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val gossipInitialDelay = clusterSettings.GossipInitialDelay val gossipFrequency = clusterSettings.GossipFrequency + implicit val memberOrdering = Ordering.fromLessThan[Member](_.address.toString > _.address.toString) + implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) private val nodeToJoin: Option[Member] = @@ -156,7 +158,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val state = { val member = Member(remoteAddress, MemberStatus.Joining) - val gossip = Gossip(self = member, members = SortedSet.empty[Member](memberOrdering) + member) + val gossip = Gossip(self = member, members = SortedSet.empty[Member] + member) new AtomicReference[State](State(gossip)) } @@ -223,6 +225,9 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val oldMembers = oldGossip.members val newGossip = oldGossip copy (members = oldMembers + Member(node, MemberStatus.Joining)) // add joining node as Joining val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + + // FIXME set flag state.isSingletonCluster = false (if true) + if (!state.compareAndSet(oldState, newState)) joining(node) // recur if we failed update } @@ -332,32 +337,33 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ private def gossip() { val oldState = state.get - val oldGossip = oldState.latestGossip + if (!oldState.isSingletonCluster) { // do not gossip if we are a singleton cluster + val oldGossip = oldState.latestGossip + val oldMembers = oldGossip.members + val oldMembersSize = oldMembers.size - val oldMembers = oldGossip.members - val oldMembersSize = oldMembers.size + val oldUnavailableMembers = oldGossip.unavailableMembers + val oldUnavailableMembersSize = oldUnavailableMembers.size - val oldUnavailableMembers = oldGossip.unavailableMembers - val oldUnavailableMembersSize = oldUnavailableMembers.size + // 1. gossip to alive members + val shouldGossipToDeputy = + if (oldUnavailableMembersSize > 0) gossipToRandomNodeOf(oldMembers) + else false - // 1. gossip to alive members - val shouldGossipToDeputy = - if (oldUnavailableMembersSize > 0) gossipToRandomNodeOf(oldMembers) - else false + // 2. gossip to dead members + if (oldUnavailableMembersSize > 0) { + val probability: Double = oldUnavailableMembersSize / (oldMembersSize + 1) + if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnavailableMembers) + } - // 2. gossip to dead members - if (oldUnavailableMembersSize > 0) { - val probability: Double = oldUnavailableMembersSize / (oldMembersSize + 1) - if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnavailableMembers) - } - - // 3. gossip to a deputy nodes for facilitating partition healing - val deputies = deputyNodesWithoutMyself - if ((!shouldGossipToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { - if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) - else { - val probability = 1.0 / oldMembersSize + oldUnavailableMembersSize - if (random.nextDouble() <= probability) gossipToRandomNodeOf(deputies) + // 3. gossip to a deputy nodes for facilitating partition healing + val deputies = deputyNodesWithoutMyself + if ((!shouldGossipToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { + if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) + else { + val probability = 1.0 / oldMembersSize + oldUnavailableMembersSize + if (random.nextDouble() <= probability) gossipToRandomNodeOf(deputies) + } } } } @@ -378,7 +384,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { else member } // ugly crap to work around bug in scala colletions ('val ss: SortedSet[Member] = SortedSet.empty[Member] ++ aSet' does not compile) - val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*)(memberOrdering) + val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*) val newGossip = oldGossip copy (self = newSelf, members = newMembersSortedSet) val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) @@ -421,27 +427,28 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { @tailrec final private def scrutinize() { val oldState = state.get - val oldGossip = oldState.latestGossip + if (!oldState.isSingletonCluster) { // do not scrutinize if we are a singleton cluster + val oldGossip = oldState.latestGossip + val oldMembers = oldGossip.members + val oldUnavailableMembers = oldGossip.unavailableMembers + val newlyDetectedUnavailableMembers = oldMembers filterNot (member ⇒ failureDetector.isAvailable(member.address)) - val oldMembers = oldGossip.members - val oldUnavailableMembers = oldGossip.unavailableMembers - val newlyDetectedUnavailableMembers = oldMembers filterNot (member ⇒ failureDetector.isAvailable(member.address)) + if (!newlyDetectedUnavailableMembers.isEmpty) { // we have newly detected members marked as unavailable + val newMembers = oldMembers diff newlyDetectedUnavailableMembers + val newUnavailableMembers = oldUnavailableMembers ++ newlyDetectedUnavailableMembers - if (!newlyDetectedUnavailableMembers.isEmpty) { // we have newly detected members marked as unavailable - val newMembers = oldMembers diff newlyDetectedUnavailableMembers - val newUnavailableMembers = oldUnavailableMembers ++ newlyDetectedUnavailableMembers + val newGossip = oldGossip copy (members = newMembers, unavailableMembers = newUnavailableMembers) + val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) - val newGossip = oldGossip copy (members = newMembers, unavailableMembers = newUnavailableMembers) - val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) - - // if we won the race then update else try again - if (!state.compareAndSet(oldState, newState)) scrutinize() // recur - else { - // notify listeners on successful update of state - for { - deadNode ← newUnavailableMembers - listener ← oldState.memberMembershipChangeListeners - } listener memberDisconnected deadNode + // if we won the race then update else try again + if (!state.compareAndSet(oldState, newState)) scrutinize() // recur + else { + // notify listeners on successful update of state + for { + deadNode ← newUnavailableMembers + listener ← oldState.memberMembershipChangeListeners + } listener memberDisconnected deadNode + } } } } @@ -485,6 +492,4 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private def deputyNodesWithoutMyself: Seq[Member] = Seq.empty[Member] filter (_.address != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq private def selectRandomNode(members: Seq[Member]): Member = members(random.nextInt(members.size)) - - private def memberOrdering = Ordering.fromLessThan[Member](_.address.toString > _.address.toString) } From ccba27a829d896ece9867143af4ed817f8eae212 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Thu, 9 Feb 2012 13:36:39 +0100 Subject: [PATCH 38/72] Refactored Gossip state and management. Introduced GossipOverview with convergence info, renamed some fields, added some new cluster commands. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/Gossiper.scala | 130 +++++++++++------- 1 file changed, 78 insertions(+), 52 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index a74debcffe..249546b0ad 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -45,22 +45,19 @@ sealed trait ClusterMessage extends Serializable case class Join(node: Address) extends ClusterMessage /** - * Represents the state of the cluster; cluster ring membership, ring convergence, meta data - all versioned by a vector clock. + * Command to leave the cluster. */ -case class Gossip( - self: Member, - // sorted set of members with their status, sorted by name - members: SortedSet[Member], - unavailableMembers: Set[Member] = Set.empty[Member], - // for ring convergence - seen: Map[Member, VectorClock] = Map.empty[Member, VectorClock], - // for handoff - //pendingChanges: Option[Vector[PendingPartitioningChange]] = None, - meta: Option[Map[String, Array[Byte]]] = None, - // vector clock version - version: VectorClock = VectorClock()) - extends ClusterMessage // is a serializable cluster message - with Versioned // has a vector clock as version +case class Leave(node: Address) extends ClusterMessage + +/** + * Command to mark node as temporay down. + */ +case class Down(node: Address) extends ClusterMessage + +/** + * Command to remove a node from the cluster immediately. + */ +case class Remove(node: Address) extends ClusterMessage /** * Represents the address and the current status of a cluster member node. @@ -81,25 +78,49 @@ object MemberStatus { case object Down extends MemberStatus } -// sealed trait PendingPartitioningStatus -// object PendingPartitioningStatus { -// case object Complete extends PendingPartitioningStatus -// case object Awaiting extends PendingPartitioningStatus +// sealed trait PartitioningStatus +// object PartitioningStatus { +// case object Complete extends PartitioningStatus +// case object Awaiting extends PartitioningStatus // } -// case class PendingPartitioningChange( -// owner: Address, -// nextOwner: Address, -// changes: Vector[VNodeMod], -// status: PendingPartitioningStatus) +// case class PartitioningChange( +// from: Address, +// to: Address, +// path: PartitionPath, +// status: PartitioningStatus) + +/** + * Represents the overview of the cluster, holds the cluster convergence table and unreachable nodes. + */ +case class GossipOverview( + seen: Map[Member, VectorClock] = Map.empty[Member, VectorClock], + unreachable: Set[Member] = Set.empty[Member]) + +/** + * Represents the state of the cluster; cluster ring membership, ring convergence, meta data - all versioned by a vector clock. + */ +case class Gossip( + overview: GossipOverview = GossipOverview(), + self: Member, + members: SortedSet[Member], // sorted set of members with their status, sorted by name + //partitions: Tree[PartitionPath, Node] = Tree.empty[PartitionPath, Node], + //pending: Set[PartitioningChange] = Set.empty[PartitioningChange], + meta: Map[String, Array[Byte]] = Map.empty[String, Array[Byte]], + version: VectorClock = VectorClock()) // vector clock version + extends ClusterMessage // is a serializable cluster message + with Versioned // has a vector clock as version final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { val log = Logging(system, "ClusterDaemon") def receive = { - case Join(address) ⇒ gossiper.joining(address) - case gossip: Gossip ⇒ gossiper.receive(gossip) - case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") + case gossip: Gossip ⇒ gossiper.receive(gossip) + case Join(address) ⇒ gossiper.joining(address) + case Leave(address) ⇒ //gossiper.leaving(address) + case Down(address) ⇒ //gossiper.downing(address) + case Remove(address) ⇒ //gossiper.removing(address) + case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } @@ -185,8 +206,11 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. */ def shutdown() { + + // FIXME Cheating. Can't just shut down. Node must first gossip an Leave command, wait for Leader to do proper Handoff and then await an Exit command before switching to Removed + if (isRunning.compareAndSet(true, false)) { - log.info("Node [{}] - Shutting down Gossiper", remoteAddress) + log.info("Node [{}] - Shutting down Gossiper and ClusterDaemon", remoteAddress) try connectionManager.shutdown() finally { try system.stop(clusterDaemon) finally { try gossipCanceller.cancel() finally { @@ -251,14 +275,14 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // val gossip = VectorClock.latestVersionOf(newGossip, oldState.latestGossip) // println("-------- WINNING VERSION " + gossip) - // val latestAvailableNodes = gossip.members - // val latestUnavailableNodes = gossip.unavailableMembers + // val latestMembers = gossip.members + // val latestUnreachableMembers = gossip.overview.unreachable // println("=======>>> myself: " + myself) - // println("=======>>> latestAvailableNodes: " + latestAvailableNodes) - // if (!(latestAvailableNodes contains myself) && !(latestUnavailableNodes contains myself)) { + // println("=======>>> latestMembers: " + latestMembers) + // if (!(latestMembers contains myself) && !(latestUnreachableMembers contains myself)) { // println("-------- NEW NODE") // // we have a new member - // val newGossip = gossip copy (availableNodes = latestAvailableNodes + myself) + // val newGossip = gossip copy (availableNodes = latestMembers + myself) // val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) // println("--------- new GOSSIP " + newGossip.members) @@ -268,19 +292,19 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // else { // println("---------- WON RACE - setting state") // // create connections for all new members in the latest gossip - // (latestAvailableNodes + myself) foreach { member ⇒ + // (latestMembers + myself) foreach { member ⇒ // setUpConnectionTo(member) // oldState.memberMembershipChangeListeners foreach (_ memberConnected member) // notify listeners about the new members // } // } - // } else if (latestUnavailableNodes contains myself) { + // } else if (latestUnreachableMembers contains myself) { // // gossip from an old former dead member - // val newUnavailableMembers = latestUnavailableNodes - myself - // val newMembers = latestAvailableNodes + myself + // val newUnreachableMembers = latestUnreachableMembers - myself + // val newMembers = latestMembers + myself - // val newGossip = gossip copy (availableNodes = newMembers, unavailableNodes = newUnavailableMembers) + // val newGossip = gossip copy (availableNodes = newMembers, unavailableNodes = newUnreachableMembers) // val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) // // if we won the race then update else try again @@ -342,18 +366,18 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val oldMembers = oldGossip.members val oldMembersSize = oldMembers.size - val oldUnavailableMembers = oldGossip.unavailableMembers - val oldUnavailableMembersSize = oldUnavailableMembers.size + val oldUnreachableMembers = oldGossip.overview.unreachable + val oldUnreachableSize = oldUnreachableMembers.size // 1. gossip to alive members val shouldGossipToDeputy = - if (oldUnavailableMembersSize > 0) gossipToRandomNodeOf(oldMembers) + if (oldUnreachableSize > 0) gossipToRandomNodeOf(oldMembers) else false // 2. gossip to dead members - if (oldUnavailableMembersSize > 0) { - val probability: Double = oldUnavailableMembersSize / (oldMembersSize + 1) - if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnavailableMembers) + if (oldUnreachableSize > 0) { + val probability: Double = oldUnreachableSize / (oldMembersSize + 1) + if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnreachableMembers) } // 3. gossip to a deputy nodes for facilitating partition healing @@ -361,7 +385,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if ((!shouldGossipToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) else { - val probability = 1.0 / oldMembersSize + oldUnavailableMembersSize + val probability = 1.0 / oldMembersSize + oldUnreachableSize if (random.nextDouble() <= probability) gossipToRandomNodeOf(deputies) } } @@ -429,15 +453,17 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val oldState = state.get if (!oldState.isSingletonCluster) { // do not scrutinize if we are a singleton cluster val oldGossip = oldState.latestGossip + val oldOverview = oldGossip.overview val oldMembers = oldGossip.members - val oldUnavailableMembers = oldGossip.unavailableMembers - val newlyDetectedUnavailableMembers = oldMembers filterNot (member ⇒ failureDetector.isAvailable(member.address)) + val oldUnreachableMembers = oldGossip.overview.unreachable + val newlyDetectedUnreachableMembers = oldMembers filterNot (member ⇒ failureDetector.isAvailable(member.address)) - if (!newlyDetectedUnavailableMembers.isEmpty) { // we have newly detected members marked as unavailable - val newMembers = oldMembers diff newlyDetectedUnavailableMembers - val newUnavailableMembers = oldUnavailableMembers ++ newlyDetectedUnavailableMembers + if (!newlyDetectedUnreachableMembers.isEmpty) { // we have newly detected members marked as unavailable + val newMembers = oldMembers diff newlyDetectedUnreachableMembers + val newUnreachableMembers = oldUnreachableMembers ++ newlyDetectedUnreachableMembers - val newGossip = oldGossip copy (members = newMembers, unavailableMembers = newUnavailableMembers) + val newOverview = oldOverview copy (unreachable = newUnreachableMembers) + val newGossip = oldGossip copy (overview = newOverview, members = newMembers) val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) // if we won the race then update else try again @@ -445,7 +471,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { else { // notify listeners on successful update of state for { - deadNode ← newUnavailableMembers + deadNode ← newUnreachableMembers listener ← oldState.memberMembershipChangeListeners } listener memberDisconnected deadNode } From 9aa5b08f38ba3e434d7b9be878ce8dcf5928d1f4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Thu, 9 Feb 2012 15:55:29 +0100 Subject: [PATCH 39/72] Added 'or' method to Versioned. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/VectorClock.scala | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala index d8d87db75b..13583cc120 100644 --- a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala +++ b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala @@ -13,12 +13,21 @@ class VectorClockException(message: String) extends AkkaException(message) */ trait Versioned { def version: VectorClock + + /** + * Returns the Versioned that have the latest version. + */ + def or(other: Versioned): Versioned = Versioned.latestVersionOf(this, other) } /** * Utility methods for comparing Versioned instances. */ object Versioned { + + /** + * Returns the Versioned that have the latest version. + */ def latestVersionOf[T <: Versioned](versioned1: T, versioned2: T): T = { (versioned1.version compare versioned2.version) match { case VectorClock.Before ⇒ versioned2 // version 1 is BEFORE (older), use version 2 From cb9ce7b6639565e461d59e408cbe452eb0455b35 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Thu, 9 Feb 2012 15:59:10 +0100 Subject: [PATCH 40/72] Implemented 'receive(newGossip)' plus misc other changes and fixes. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Implemented 'receive(newGossip)' * Added GossipEnvelope * Added MetaDataChangeListener * Changed MembershipChangeListener API * Changed most internal API to work with Address rather than Member * Added builder style API to Gossip for changing it in an immutable way * Moved 'self: Member' from Gossip to State Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/Gossiper.scala | 241 +++++++++--------- .../scala/akka/cluster/NodeStartupSpec.scala | 2 +- 2 files changed, 118 insertions(+), 125 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 249546b0ad..783690a249 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -25,11 +25,19 @@ import scala.annotation.tailrec import com.google.protobuf.ByteString /** - * Interface for member membership change listener. + * Interface for membership change listener. */ -trait NodeMembershipChangeListener { - def memberConnected(member: Member) - def memberDisconnected(member: Member) +trait MembershipChangeListener { // FIXME add notification of MembershipChangeListener + def notify(members: SortedSet[Member]): Unit + // def memberConnected(member: Member): Unit + // def memberDisconnected(member: Member): Unit +} + +/** + * Interface for meta data change listener. + */ +trait MetaDataChangeListener { // FIXME add management and notification for MetaDataChangeListener + def notify(meta: Map[String, Array[Byte]]): Unit } // FIXME create Protobuf messages out of all the Gossip stuff - but wait until the prototol is fully stablized. @@ -64,6 +72,11 @@ case class Remove(node: Address) extends ClusterMessage */ case class Member(address: Address, status: MemberStatus) extends ClusterMessage +/** + * Envelope adding a sender address to the gossip. + */ +case class GossipEnvelope(sender: Member, gossip: Gossip) extends ClusterMessage + /** * Defines the current status of a cluster member node * @@ -94,33 +107,49 @@ object MemberStatus { * Represents the overview of the cluster, holds the cluster convergence table and unreachable nodes. */ case class GossipOverview( - seen: Map[Member, VectorClock] = Map.empty[Member, VectorClock], - unreachable: Set[Member] = Set.empty[Member]) + seen: Map[Address, VectorClock] = Map.empty[Address, VectorClock], + unreachable: Set[Address] = Set.empty[Address]) /** * Represents the state of the cluster; cluster ring membership, ring convergence, meta data - all versioned by a vector clock. */ case class Gossip( overview: GossipOverview = GossipOverview(), - self: Member, members: SortedSet[Member], // sorted set of members with their status, sorted by name //partitions: Tree[PartitionPath, Node] = Tree.empty[PartitionPath, Node], //pending: Set[PartitioningChange] = Set.empty[PartitioningChange], meta: Map[String, Array[Byte]] = Map.empty[String, Array[Byte]], version: VectorClock = VectorClock()) // vector clock version extends ClusterMessage // is a serializable cluster message - with Versioned // has a vector clock as version + with Versioned { + + def addMember(member: Member): Gossip = { + if (members contains member) this + else this copy (members = members + member) + } + + /** + * Marks the gossip as seen by this node (remoteAddress) by updating the address entry in the 'gossip.overview.seen' + * Map with the VectorClock for the new gossip. + */ + def markAsSeenByThisNode(address: Address): Gossip = + this copy (overview = overview copy (seen = overview.seen + (address -> version))) + + def incrementVersion(memberFingerprint: Int): Gossip = { + this copy (version = version.increment(memberFingerprint, newTimestamp)) + } +} final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { val log = Logging(system, "ClusterDaemon") def receive = { - case gossip: Gossip ⇒ gossiper.receive(gossip) - case Join(address) ⇒ gossiper.joining(address) - case Leave(address) ⇒ //gossiper.leaving(address) - case Down(address) ⇒ //gossiper.downing(address) - case Remove(address) ⇒ //gossiper.removing(address) - case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") + case GossipEnvelope(sender, gossip) ⇒ gossiper.receive(sender, gossip) + case Join(address) ⇒ gossiper.joining(address) + case Leave(address) ⇒ //gossiper.leaving(address) + case Down(address) ⇒ //gossiper.downing(address) + case Remove(address) ⇒ //gossiper.removing(address) + case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } @@ -146,9 +175,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * all state is represented by this immutable case class and managed by an AtomicReference. */ private case class State( + self: Member, latestGossip: Gossip, isSingletonCluster: Boolean = true, // starts as singleton cluster - memberMembershipChangeListeners: Set[NodeMembershipChangeListener] = Set.empty[NodeMembershipChangeListener]) + memberMembershipChangeListeners: Set[MembershipChangeListener] = Set.empty[MembershipChangeListener]) val remoteSettings = new RemoteSettings(system.settings.config, system.name) val clusterSettings = new ClusterSettings(system.settings.config, system.name) @@ -163,8 +193,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) - private val nodeToJoin: Option[Member] = - clusterSettings.NodeToJoin filter (_ != remoteAddress) map (address ⇒ Member(address, MemberStatus.Joining)) + private val nodeToJoin: Option[Address] = clusterSettings.NodeToJoin filter (_ != remoteAddress) private val serialization = remote.serialization private val failureDetector = new AccrualFailureDetector( @@ -179,8 +208,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val state = { val member = Member(remoteAddress, MemberStatus.Joining) - val gossip = Gossip(self = member, members = SortedSet.empty[Member] + member) - new AtomicReference[State](State(gossip)) + val gossip = Gossip(members = SortedSet.empty[Member] + member) + new AtomicReference[State](State(member, gossip)) } // FIXME manage connections in some other way so we can delete the RemoteConnectionManager (SINCE IT SUCKS!!!) @@ -190,8 +219,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // try to join the node defined in the 'akka.cluster.node-to-join' option nodeToJoin match { - case None ⇒ switchStatusTo(MemberStatus.Up) // if we are singleton cluster then we are already considered to be UP - case Some(member) ⇒ join(member) + case None ⇒ switchStatusTo(MemberStatus.Up) // if we are singleton cluster then we are already considered to be UP + case Some(address) ⇒ join(address) } // start periodic gossip and cluster scrutinization @@ -231,7 +260,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Member status for this node. */ - def self: Member = latestGossip.self + def self: Member = state.get.self /** * Is this node a singleton cluster? @@ -248,7 +277,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val oldGossip = oldState.latestGossip val oldMembers = oldGossip.members val newGossip = oldGossip copy (members = oldMembers + Member(node, MemberStatus.Joining)) // add joining node as Joining - val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + val newState = oldState copy (latestGossip = newGossip.incrementVersion(memberFingerprint)) // FIXME set flag state.isSingletonCluster = false (if true) @@ -258,66 +287,37 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Receive new gossip. */ - //@tailrec - final def receive(newGossip: Gossip) { - val from = newGossip.self - log.debug("Node [{}] - Receiving gossip from [{}]", remoteAddress, from.address) + @tailrec + final def receive(sender: Member, newGossip: Gossip) { + log.debug("Node [{}] - Receiving gossip from [{}]", remoteAddress, sender.address) - failureDetector heartbeat from.address // update heartbeat in failure detector + failureDetector heartbeat sender.address // update heartbeat in failure detector // FIXME set flag state.isSingletonCluster = false (if true) - // FIXME all below here is WRONG - redesign with cluster convergence in mind + // FIXME check for convergence - if we have convergence then trigger the listeners - // val oldState = state.get - // println("-------- NEW VERSION " + newGossip) - // println("-------- OLD VERSION " + oldState.latestGossip) - // val gossip = VectorClock.latestVersionOf(newGossip, oldState.latestGossip) - // println("-------- WINNING VERSION " + gossip) + val oldState = state.get + val oldGossip = oldState.latestGossip - // val latestMembers = gossip.members - // val latestUnreachableMembers = gossip.overview.unreachable - // println("=======>>> myself: " + myself) - // println("=======>>> latestMembers: " + latestMembers) - // if (!(latestMembers contains myself) && !(latestUnreachableMembers contains myself)) { - // println("-------- NEW NODE") - // // we have a new member - // val newGossip = gossip copy (availableNodes = latestMembers + myself) - // val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + val gossip = Versioned + .latestVersionOf(newGossip, oldGossip) + .addMember(self) // needed if newGossip won + .addMember(sender) // needed if oldGossip won + .markAsSeenByThisNode(remoteAddress) + .incrementVersion(memberFingerprint) - // println("--------- new GOSSIP " + newGossip.members) - // println("--------- new STATE " + newState) - // // if we won the race then update else try again - // if (!state.compareAndSet(oldState, newState)) receive(newGossip) // recur - // else { - // println("---------- WON RACE - setting state") - // // create connections for all new members in the latest gossip - // (latestMembers + myself) foreach { member ⇒ - // setUpConnectionTo(member) - // oldState.memberMembershipChangeListeners foreach (_ memberConnected member) // notify listeners about the new members - // } - // } + val newState = oldState copy (latestGossip = gossip) - // } else if (latestUnreachableMembers contains myself) { - // // gossip from an old former dead member - - // val newUnreachableMembers = latestUnreachableMembers - myself - // val newMembers = latestMembers + myself - - // val newGossip = gossip copy (availableNodes = newMembers, unavailableNodes = newUnreachableMembers) - // val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) - - // // if we won the race then update else try again - // if (!state.compareAndSet(oldState, newState)) receive(newGossip) // recur - // else oldState.memberMembershipChangeListeners foreach (_ memberConnected myself) // notify listeners on successful update of state - // } + // if we won the race then update else try again + if (!state.compareAndSet(oldState, newState)) receive(sender, newGossip) // recur if we fail the update } /** * Registers a listener to subscribe to cluster membership changes. */ @tailrec - final def registerListener(listener: NodeMembershipChangeListener) { + final def registerListener(listener: MembershipChangeListener) { val oldState = state.get val newListeners = oldState.memberMembershipChangeListeners + listener val newState = oldState copy (memberMembershipChangeListeners = newListeners) @@ -328,7 +328,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * Unsubscribes to cluster membership changes. */ @tailrec - final def unregisterListener(listener: NodeMembershipChangeListener) { + final def unregisterListener(listener: MembershipChangeListener) { val oldState = state.get val newListeners = oldState.memberMembershipChangeListeners - listener val newState = oldState copy (memberMembershipChangeListeners = newListeners) @@ -338,10 +338,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Joins the pre-configured contact point and retrieves current gossip state. */ - private def join(member: Member) { - setUpConnectionTo(member) foreach { connection ⇒ + private def join(address: Address) { + setUpConnectionTo(address) foreach { connection ⇒ val command = Join(remoteAddress) - log.info("Node [{}] - Sending [{}] to [{}]", remoteAddress, command, member.address) + log.info("Node [{}] - Sending [{}] to [{}]", remoteAddress, command, address) connection ! command } @@ -366,23 +366,23 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val oldMembers = oldGossip.members val oldMembersSize = oldMembers.size - val oldUnreachableMembers = oldGossip.overview.unreachable - val oldUnreachableSize = oldUnreachableMembers.size + val oldUnreachableAddresses = oldGossip.overview.unreachable + val oldUnreachableSize = oldUnreachableAddresses.size // 1. gossip to alive members - val shouldGossipToDeputy = - if (oldUnreachableSize > 0) gossipToRandomNodeOf(oldMembers) + val gossipedToDeputy = + if (oldUnreachableSize > 0) gossipToRandomNodeOf(oldMembers.toList map { _.address }) else false - // 2. gossip to dead members + // 2. gossip to unreachable members if (oldUnreachableSize > 0) { val probability: Double = oldUnreachableSize / (oldMembersSize + 1) - if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnreachableMembers) + if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnreachableAddresses.toList) } // 3. gossip to a deputy nodes for facilitating partition healing val deputies = deputyNodesWithoutMyself - if ((!shouldGossipToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { + if ((!gossipedToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) else { val probability = 1.0 / oldMembersSize + oldUnreachableSize @@ -392,13 +392,16 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } } + /** + * Switches the state in the FSM. + */ @tailrec final private def switchStatusTo(newStatus: MemberStatus) { log.info("Node [{}] - Switching membership status to [{}]", remoteAddress, newStatus) val oldState = state.get - val oldGossip = oldState.latestGossip + val oldSelf = oldState.self - val oldSelf = oldGossip.self + val oldGossip = oldState.latestGossip val oldMembers = oldGossip.members val newSelf = oldSelf copy (status = newStatus) @@ -410,16 +413,16 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // ugly crap to work around bug in scala colletions ('val ss: SortedSet[Member] = SortedSet.empty[Member] ++ aSet' does not compile) val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*) - val newGossip = oldGossip copy (self = newSelf, members = newMembersSortedSet) - val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + val newGossip = oldGossip copy (members = newMembersSortedSet) incrementVersion memberFingerprint + val newState = oldState copy (self = newSelf, latestGossip = newGossip) if (!state.compareAndSet(oldState, newState)) switchStatusTo(newStatus) // recur if we failed update } /** - * Gossips latest gossip to a member. + * Gossips latest gossip to an address. */ - private def gossipTo(member: Member) { - setUpConnectionTo(member) foreach { _ ! latestGossip } + private def gossipTo(address: Address) { + setUpConnectionTo(address) foreach { _ ! GossipEnvelope(self, latestGossip) } } /** @@ -427,8 +430,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * * @return 'true' if it gossiped to a "deputy" member. */ - private def gossipToRandomNodeOf(members: Seq[Member]): Boolean = { - val peers = members filter (_.address != remoteAddress) // filter out myself + private def gossipToRandomNodeOf(addresses: Seq[Address]): Boolean = { + val peers = addresses filter (_ != remoteAddress) // filter out myself val peer = selectRandomNode(peers) val oldState = state.get val oldGossip = oldState.latestGossip @@ -438,15 +441,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } /** - * Gossips to a random member in the set of members passed in as argument. - * - * @return 'true' if it gossiped to a "deputy" member. - */ - private def gossipToRandomNodeOf(members: Set[Member]): Boolean = gossipToRandomNodeOf(members.toList) - - /** - * Scrutinizes the cluster; marks members detected by the failure detector as unavailable, and notifies all listeners - * of the change in the cluster membership. + * Scrutinizes the cluster; marks members detected by the failure detector as unavailable. */ @tailrec final private def scrutinize() { @@ -455,25 +450,28 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val oldGossip = oldState.latestGossip val oldOverview = oldGossip.overview val oldMembers = oldGossip.members - val oldUnreachableMembers = oldGossip.overview.unreachable - val newlyDetectedUnreachableMembers = oldMembers filterNot (member ⇒ failureDetector.isAvailable(member.address)) + val oldUnreachableAddresses = oldGossip.overview.unreachable - if (!newlyDetectedUnreachableMembers.isEmpty) { // we have newly detected members marked as unavailable + val newlyDetectedUnreachableMembers = oldMembers filterNot { member ⇒ failureDetector.isAvailable(member.address) } + val newlyDetectedUnreachableAddresses = newlyDetectedUnreachableMembers map { _.address } + + if (!newlyDetectedUnreachableAddresses.isEmpty) { // we have newly detected members marked as unavailable val newMembers = oldMembers diff newlyDetectedUnreachableMembers - val newUnreachableMembers = oldUnreachableMembers ++ newlyDetectedUnreachableMembers + val newUnreachableAddresses: Set[Address] = (oldUnreachableAddresses ++ newlyDetectedUnreachableAddresses) - val newOverview = oldOverview copy (unreachable = newUnreachableMembers) - val newGossip = oldGossip copy (overview = newOverview, members = newMembers) - val newState = oldState copy (latestGossip = incrementVersionForGossip(newGossip)) + val newOverview = oldOverview copy (unreachable = newUnreachableAddresses) + val newGossip = oldGossip copy (overview = newOverview, members = newMembers) incrementVersion memberFingerprint + val newState = oldState copy (latestGossip = newGossip) // if we won the race then update else try again if (!state.compareAndSet(oldState, newState)) scrutinize() // recur else { + // FIXME should only notify when there is a cluster convergence // notify listeners on successful update of state - for { - deadNode ← newUnreachableMembers - listener ← oldState.memberMembershipChangeListeners - } listener memberDisconnected deadNode + // for { + // deadNode ← newUnreachableAddresses + // listener ← oldState.memberMembershipChangeListeners + // } listener memberDisconnected deadNode } } } @@ -481,29 +479,28 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // FIXME should shuffle list randomly before start traversing to avoid connecting to some member on every member @tailrec - final private def connectToRandomNodeOf(members: Seq[Member]): ActorRef = { - members match { - case member :: rest ⇒ - setUpConnectionTo(member) match { + final private def connectToRandomNodeOf(addresses: Seq[Address]): ActorRef = { + addresses match { + case address :: rest ⇒ + setUpConnectionTo(address) match { case Some(connection) ⇒ connection case None ⇒ connectToRandomNodeOf(rest) // recur if } case Nil ⇒ throw new RemoteConnectionException( - "Could not establish connection to any of the members in the argument list") + "Could not establish connection to any of the addresses in the argument list") } } /** - * Sets up remote connections to all the members in the argument list. + * Sets up remote connections to all the addresses in the argument list. */ - private def setUpConnectionsTo(members: Seq[Member]): Seq[Option[ActorRef]] = members map { setUpConnectionTo(_) } + private def setUpConnectionsTo(addresses: Seq[Address]): Seq[Option[ActorRef]] = addresses map setUpConnectionTo /** * Sets up remote connection. */ - private def setUpConnectionTo(member: Member): Option[ActorRef] = { - val address = member.address + private def setUpConnectionTo(address: Address): Option[ActorRef] = { try { Some(connectionManager.putIfAbsent(address, () ⇒ system.actorFor(RootActorPath(address) / "system" / "cluster"))) } catch { @@ -511,11 +508,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } } - private def incrementVersionForGossip(from: Gossip): Gossip = { - from copy (version = from.version.increment(memberFingerprint, newTimestamp)) - } + private def deputyNodesWithoutMyself: Seq[Address] = Seq.empty[Address] filter (_ != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq - private def deputyNodesWithoutMyself: Seq[Member] = Seq.empty[Member] filter (_.address != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq - - private def selectRandomNode(members: Seq[Member]): Member = members(random.nextInt(members.size)) + private def selectRandomNode(addresses: Seq[Address]): Address = addresses(random nextInt addresses.size) } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index 32bf0bc6b5..de59541dfa 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -14,7 +14,7 @@ import com.typesafe.config._ class NodeStartupSpec extends AkkaSpec(""" akka { - loglevel = "DEBUG" + loglevel = "INFO" } """) with ImplicitSender { From 9de7a2daae1c96aeb9d77df0acee3ab5098d43ee Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 14 Feb 2012 20:40:32 +0100 Subject: [PATCH 41/72] Added NodeGossipingSpec for testing gossiping and cluster membership. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../akka/cluster/NodeGossipingSpec.scala | 141 ++++++++++++++++++ 1 file changed, 141 insertions(+) create mode 100644 akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala new file mode 100644 index 0000000000..a3cc492a23 --- /dev/null +++ b/akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala @@ -0,0 +1,141 @@ +/** + * Copyright (C) 2009-2011 Typesafe Inc. + */ +package akka.cluster + +import java.net.InetSocketAddress + +import akka.testkit._ +import akka.dispatch._ +import akka.actor._ +import akka.remote._ + +import com.typesafe.config._ + +class NodeGossipingSpec extends AkkaSpec(""" + akka { + loglevel = "DEBUG" + } + """) with ImplicitSender { + + var gossiper0: Gossiper = _ + var gossiper1: Gossiper = _ + var gossiper2: Gossiper = _ + + var node0: ActorSystemImpl = _ + var node1: ActorSystemImpl = _ + var node2: ActorSystemImpl = _ + + try { + "A set of connected cluster nodes" must { + "(when two nodes) start gossiping to each other so that both nodes gets the same gossip info" in { + node0 = ActorSystem("NodeGossipingSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] + gossiper0 = Gossiper(node0, remote0) + + node1 = ActorSystem("NodeGossipingSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5551 + } + cluster.node-to-join = "akka://NodeGossipingSpec@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] + gossiper1 = Gossiper(node1, remote1) + + Thread.sleep(5000) + + val members0 = gossiper0.latestGossip.members.toArray + members0.size must be(2) + members0(0).address.port.get must be(5550) + members0(0).status must be(MemberStatus.Joining) + members0(1).address.port.get must be(5551) + members0(1).status must be(MemberStatus.Joining) + + val members1 = gossiper1.latestGossip.members.toArray + members1.size must be(2) + members1(0).address.port.get must be(5550) + members1(0).status must be(MemberStatus.Joining) + members1(1).address.port.get must be(5551) + members1(1).status must be(MemberStatus.Joining) + } + + "(when three nodes) start gossiping to each other so that both nodes gets the same gossip info" in { + node2 = ActorSystem("NodeGossipingSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5552 + } + cluster.node-to-join = "akka://NodeGossipingSpec@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] + gossiper2 = Gossiper(node2, remote2) + + Thread.sleep(10000) + + val members0 = gossiper0.latestGossip.members.toArray + val version = gossiper0.latestGossip.version + members0.size must be(3) + members0(0).address.port.get must be(5550) + members0(0).status must be(MemberStatus.Joining) + members0(1).address.port.get must be(5551) + members0(1).status must be(MemberStatus.Joining) + members0(2).address.port.get must be(5552) + members0(2).status must be(MemberStatus.Joining) + + val members1 = gossiper1.latestGossip.members.toArray + members1.size must be(3) + members1(0).address.port.get must be(5550) + members1(0).status must be(MemberStatus.Joining) + members1(1).address.port.get must be(5551) + members1(1).status must be(MemberStatus.Joining) + members1(2).address.port.get must be(5552) + members1(2).status must be(MemberStatus.Joining) + + val members2 = gossiper2.latestGossip.members.toArray + members2.size must be(3) + members2(0).address.port.get must be(5550) + members2(0).status must be(MemberStatus.Joining) + members2(1).address.port.get must be(5551) + members2(1).status must be(MemberStatus.Joining) + members2(2).address.port.get must be(5552) + members2(2).status must be(MemberStatus.Joining) + } + } + } catch { + case e: Exception ⇒ + e.printStackTrace + fail(e.toString) + } + + override def atTermination() { + gossiper0.shutdown() + node0.shutdown() + + gossiper1.shutdown() + node1.shutdown() + + gossiper2.shutdown() + node2.shutdown() + } +} From 5f537a7d4c6989f60ecb581c28114f758eb5eb2f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 14 Feb 2012 20:42:03 +0100 Subject: [PATCH 42/72] Rewrite of the VectorClock impl. Now with 'merge' support and slicker API. Also added more tests. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/VectorClock.scala | 246 +++++++++------ .../scala/akka/cluster/VectorClockSpec.scala | 290 +++++++++++------- 2 files changed, 331 insertions(+), 205 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala index 13583cc120..e27215b23c 100644 --- a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala +++ b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala @@ -5,19 +5,21 @@ package akka.cluster import akka.AkkaException +import akka.event.Logging +import akka.actor.ActorSystem + +import System.{ currentTimeMillis ⇒ newTimestamp } +import java.security.MessageDigest +import java.util.concurrent.atomic.AtomicLong class VectorClockException(message: String) extends AkkaException(message) /** * Trait to be extended by classes that wants to be versioned using a VectorClock. */ -trait Versioned { +trait Versioned[T] { def version: VectorClock - - /** - * Returns the Versioned that have the latest version. - */ - def or(other: Versioned): Versioned = Versioned.latestVersionOf(this, other) + def +(node: VectorClock.Node): T } /** @@ -25,14 +27,104 @@ trait Versioned { */ object Versioned { + /** + * The result of comparing two Versioned objects. + * Either: + * {{{ + * 1) v1 is BEFORE v2 => Before + * 2) v1 is AFTER t2 => After + * 3) v1 happens CONCURRENTLY to v2 => Concurrent + * }}} + */ + sealed trait Ordering + case object Before extends Ordering + case object After extends Ordering + case object Concurrent extends Ordering + + /** + * Returns or 'Ordering' for the two 'Versioned' instances. + */ + def compare[T <: Versioned[T]](versioned1: Versioned[T], versioned2: Versioned[T]): Ordering = { + if (versioned1.version <> versioned2.version) Concurrent + else if (versioned1.version < versioned2.version) Before + else After + } + /** * Returns the Versioned that have the latest version. */ - def latestVersionOf[T <: Versioned](versioned1: T, versioned2: T): T = { - (versioned1.version compare versioned2.version) match { - case VectorClock.Before ⇒ versioned2 // version 1 is BEFORE (older), use version 2 - case VectorClock.After ⇒ versioned1 // version 1 is AFTER (newer), use version 1 - case VectorClock.Concurrent ⇒ versioned1 // can't establish a causal relationship between versions => conflict - keeping version 1 + def latestVersionOf[T <: Versioned[T]](versioned1: T, versioned2: T): T = { + compare(versioned1, versioned2) match { + case Concurrent ⇒ versioned2 + case Before ⇒ versioned2 + case After ⇒ versioned1 + } + } +} + +/** + * VectorClock module with helper classes and methods. + * + * Based on code from the 'vlock' VectorClock library by Coda Hale. + */ +object VectorClock { + + /** + * Hash representation of a versioned node name. + */ + class Node private (val name: String) extends Serializable { + override def hashCode = 0 + name.## + + override def equals(other: Any) = Node.unapply(this) == Node.unapply(other) + + override def toString = name.mkString("Node(", "", ")") + } + + object Node { + def apply(name: String): Node = new Node(hash(name)) + + def unapply(other: Any) = other match { + case x: Node ⇒ import x._; Some(name) + case _ ⇒ None + } + + private def hash(name: String): String = { + val digester = MessageDigest.getInstance("MD5") + digester update name.getBytes + digester.digest.map { h ⇒ "%02x".format(0xFF & h) }.mkString + } + } + + /** + * Timestamp representation a unique 'Ordered' timestamp. + */ + case class Timestamp private (time: Long) extends Ordered[Timestamp] { + def max(other: Timestamp) = { + if (this < other) other + else this + } + + def compare(other: Timestamp) = time compare other.time + + override def toString = "%016x" format time + } + + object Timestamp { + private val counter = new AtomicLong(newTimestamp) + + def zero(): Timestamp = Timestamp(0L) + + def apply(): Timestamp = { + var newTime: Long = 0L + while (newTime == 0) { + val last = counter.get + val current = newTimestamp + val next = if (current > last) current else last + 1 + if (counter.compareAndSet(last, next)) { + newTime = next + } + } + new Timestamp(newTime) } } } @@ -44,108 +136,68 @@ object Versioned { * 1) Leslie Lamport (1978). "Time, clocks, and the ordering of events in a distributed system". Communications of the ACM 21 (7): 558-565. * 2) Friedemann Mattern (1988). "Virtual Time and Global States of Distributed Systems". Workshop on Parallel and Distributed Algorithms: pp. 215-226 * }}} + * + * Based on code from the 'vlock' VectorClock library by Coda Hale. */ case class VectorClock( - versions: Vector[VectorClock.Entry] = Vector.empty[VectorClock.Entry], - timestamp: Long = System.currentTimeMillis) { + timestamp: VectorClock.Timestamp = VectorClock.Timestamp(), + versions: Map[VectorClock.Node, VectorClock.Timestamp] = Map.empty[VectorClock.Node, VectorClock.Timestamp]) + extends PartiallyOrdered[VectorClock] { + + // FIXME pruning of VectorClock history + import VectorClock._ - def compare(other: VectorClock): Ordering = VectorClock.compare(this, other) - - def increment(fingerprint: Int, timestamp: Long): VectorClock = { - val newVersions = - if (versions exists (entry ⇒ entry.fingerprint == fingerprint)) { - // update existing node entry - versions map { entry ⇒ - if (entry.fingerprint == fingerprint) entry.increment() - else entry - } - } else { - // create and append a new node entry - versions :+ Entry(fingerprint = fingerprint) - } - if (newVersions.size > MaxNrOfVersions) throw new VectorClockException("Max number of versions reached") - copy(versions = newVersions, timestamp = timestamp) - } - - def maxVersion: Long = versions.foldLeft(1L)((max, entry) ⇒ math.max(max, entry.version)) - - // FIXME Do we need to implement VectorClock.merge? - def merge(other: VectorClock): VectorClock = { - sys.error("Not implemented") - } -} - -/** - * Module with helper classes and methods. - */ -object VectorClock { - final val MaxNrOfVersions = Short.MaxValue - /** - * The result of comparing two vector clocks. - * Either: - * {{{ - * 1) v1 is BEFORE v2 - * 2) v1 is AFTER t2 - * 3) v1 happens CONCURRENTLY to v2 - * }}} + * Increment the version for the node passed as argument. Returns a new VectorClock. */ - sealed trait Ordering - case object Before extends Ordering - case object After extends Ordering - case object Concurrent extends Ordering + def +(node: Node): VectorClock = copy(versions = versions + (node -> Timestamp())) /** - * Versioned entry in a vector clock. + * Returns true if this and that are concurrent else false. */ - case class Entry(fingerprint: Int, version: Long = 1L) { - def increment(): Entry = copy(version = version + 1L) - } + def <>(that: VectorClock): Boolean = tryCompareTo(that) == None /** + * Returns true if this VectorClock has the same history as the 'that' VectorClock else false. + */ + def ==(that: VectorClock): Boolean = versions == that.versions + + /** + * For the 'PartiallyOrdered' trait, to allow natural comparisons using <, > and ==. + *

* Compare two vector clocks. The outcomes will be one of the following: *

* {{{ - * 1. Clock 1 is BEFORE clock 2 if there exists an i such that c1(i) <= c(2) and there does not exist a j such that c1(j) > c2(j). - * 2. Clock 1 is CONCURRENT to clock 2 if there exists an i, j such that c1(i) < c2(i) and c1(j) > c2(j). - * 3. Clock 1 is AFTER clock 2 otherwise. + * 1. Clock 1 is BEFORE (>) Clock 2 if there exists an i such that c1(i) <= c(2) and there does not exist a j such that c1(j) > c2(j). + * 2. Clock 1 is CONCURRENT (<>) to Clock 2 if there exists an i, j such that c1(i) < c2(i) and c1(j) > c2(j). + * 3. Clock 1 is AFTER (<) Clock 2 otherwise. * }}} - * - * @param v1 The first VectorClock - * @param v2 The second VectorClock */ - def compare(v1: VectorClock, v2: VectorClock): Ordering = { - if ((v1 eq null) || (v2 eq null)) throw new IllegalArgumentException("Can't compare null VectorClocks") - - // FIXME rewrite to functional style, now uses ugly imperative algorithm - - var v1Bigger, v2Bigger = false // We do two checks: v1 <= v2 and v2 <= v1 if both are true then - var p1, p2 = 0 - - while (p1 < v1.versions.size && p2 < v2.versions.size) { - val ver1 = v1.versions(p1) - val ver2 = v2.versions(p2) - if (ver1.fingerprint == ver2.fingerprint) { - if (ver1.version > ver2.version) v1Bigger = true - else if (ver2.version > ver1.version) v2Bigger = true - p1 += 1 - p2 += 1 - } else if (ver1.fingerprint > ver2.fingerprint) { - v2Bigger = true // Since ver1 is bigger that means it is missing a version that ver2 has - p2 += 1 - } else { - v1Bigger = true // This means ver2 is bigger which means it is missing a version ver1 has - p1 += 1 - } + def tryCompareTo[V >: VectorClock <% PartiallyOrdered[V]](vclock: V): Option[Int] = { + def compare(versions1: Map[Node, Timestamp], versions2: Map[Node, Timestamp]): Boolean = { + versions1.forall { case ((n, t)) ⇒ t <= versions2.getOrElse(n, Timestamp.zero) } && + (versions1.exists { case ((n, t)) ⇒ t < versions2.getOrElse(n, Timestamp.zero) } || + (versions1.size < versions2.size)) + } + vclock match { + case VectorClock(_, otherVersions) ⇒ + if (compare(versions, otherVersions)) Some(-1) + else if (compare(otherVersions, versions)) Some(1) + else if (versions == otherVersions) Some(0) + else None + case _ ⇒ None } - - if (p1 < v1.versions.size) v1Bigger = true - else if (p2 < v2.versions.size) v2Bigger = true - - if (!v1Bigger && !v2Bigger) Before // This is the case where they are equal, return BEFORE arbitrarily - else if (v1Bigger && !v2Bigger) After // This is the case where v1 is a successor clock to v2 - else if (!v1Bigger && v2Bigger) Before // This is the case where v2 is a successor clock to v1 - else Concurrent // This is the case where both clocks are parallel to one another } + + /** + * Merges this VectorClock with another VectorClock. E.g. merges its versioned history. + */ + def merge(that: VectorClock): VectorClock = { + val mergedVersions = scala.collection.mutable.Map.empty[Node, Timestamp] ++ that.versions + for ((node, time) ← versions) mergedVersions(node) = time max mergedVersions.getOrElse(node, time) + VectorClock(timestamp, Map.empty[Node, Timestamp] ++ mergedVersions) + } + + override def toString = versions.map { case ((n, t)) ⇒ n + " -> " + t }.mkString("VectorClock(", ", ", ")") } diff --git a/akka-cluster/src/test/scala/akka/cluster/VectorClockSpec.scala b/akka-cluster/src/test/scala/akka/cluster/VectorClockSpec.scala index df9cead7f8..65f2aa1d75 100644 --- a/akka-cluster/src/test/scala/akka/cluster/VectorClockSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/VectorClockSpec.scala @@ -2,6 +2,7 @@ package akka.cluster import java.net.InetSocketAddress import akka.testkit.AkkaSpec +import akka.actor.ActorSystem class VectorClockSpec extends AkkaSpec { import VectorClock._ @@ -10,193 +11,266 @@ class VectorClockSpec extends AkkaSpec { "have zero versions when created" in { val clock = VectorClock() - clock.versions must be(Vector()) + clock.versions must be(Map()) } - "be able to add Entry if non-existing" in { - val clock1 = VectorClock() - clock1.versions must be(Vector()) - val clock2 = clock1.increment(1, System.currentTimeMillis) - val clock3 = clock2.increment(2, System.currentTimeMillis) - - clock3.versions must be(Vector(Entry(1, 1), Entry(2, 1))) - } - - "be able to increment version of existing Entry" in { - val clock1 = VectorClock() - val clock2 = clock1.increment(1, System.currentTimeMillis) - val clock3 = clock2.increment(2, System.currentTimeMillis) - val clock4 = clock3.increment(1, System.currentTimeMillis) - val clock5 = clock4.increment(2, System.currentTimeMillis) - val clock6 = clock5.increment(2, System.currentTimeMillis) - - clock6.versions must be(Vector(Entry(1, 2), Entry(2, 3))) - } - - "The empty clock should not happen before itself" in { + "not happen before itself" in { val clock1 = VectorClock() val clock2 = VectorClock() - clock1.compare(clock2) must not be (Concurrent) + clock1 <> clock2 must be(false) } - "not happen before an identical clock" in { + "pass misc comparison test 1" in { val clock1_1 = VectorClock() - val clock2_1 = clock1_1.increment(1, System.currentTimeMillis) - val clock3_1 = clock2_1.increment(2, System.currentTimeMillis) - val clock4_1 = clock3_1.increment(1, System.currentTimeMillis) + val clock2_1 = clock1_1 + Node("1") + val clock3_1 = clock2_1 + Node("2") + val clock4_1 = clock3_1 + Node("1") val clock1_2 = VectorClock() - val clock2_2 = clock1_2.increment(1, System.currentTimeMillis) - val clock3_2 = clock2_2.increment(2, System.currentTimeMillis) - val clock4_2 = clock3_2.increment(1, System.currentTimeMillis) + val clock2_2 = clock1_2 + Node("1") + val clock3_2 = clock2_2 + Node("2") + val clock4_2 = clock3_2 + Node("1") - clock4_1.compare(clock4_2) must not be (Concurrent) + clock4_1 <> clock4_2 must be(false) } - "happen before an identical clock with a single additional event" in { + "pass misc comparison test 2" in { val clock1_1 = VectorClock() - val clock2_1 = clock1_1.increment(1, System.currentTimeMillis) - val clock3_1 = clock2_1.increment(2, System.currentTimeMillis) - val clock4_1 = clock3_1.increment(1, System.currentTimeMillis) + val clock2_1 = clock1_1 + Node("1") + val clock3_1 = clock2_1 + Node("2") + val clock4_1 = clock3_1 + Node("1") val clock1_2 = VectorClock() - val clock2_2 = clock1_2.increment(1, System.currentTimeMillis) - val clock3_2 = clock2_2.increment(2, System.currentTimeMillis) - val clock4_2 = clock3_2.increment(1, System.currentTimeMillis) - val clock5_2 = clock4_2.increment(3, System.currentTimeMillis) + val clock2_2 = clock1_2 + Node("1") + val clock3_2 = clock2_2 + Node("2") + val clock4_2 = clock3_2 + Node("1") + val clock5_2 = clock4_2 + Node("3") - clock4_1.compare(clock5_2) must be(Before) + clock4_1 < clock5_2 must be(true) } - "Two clocks with different events should be concurrent: 1" in { + "pass misc comparison test 3" in { var clock1_1 = VectorClock() - val clock2_1 = clock1_1.increment(1, System.currentTimeMillis) + val clock2_1 = clock1_1 + Node("1") val clock1_2 = VectorClock() - val clock2_2 = clock1_2.increment(2, System.currentTimeMillis) + val clock2_2 = clock1_2 + Node("2") - clock2_1.compare(clock2_2) must be(Concurrent) + clock2_1 <> clock2_2 must be(true) } - "Two clocks with different events should be concurrent: 2" in { + "pass misc comparison test 4" in { val clock1_3 = VectorClock() - val clock2_3 = clock1_3.increment(1, System.currentTimeMillis) - val clock3_3 = clock2_3.increment(2, System.currentTimeMillis) - val clock4_3 = clock3_3.increment(1, System.currentTimeMillis) + val clock2_3 = clock1_3 + Node("1") + val clock3_3 = clock2_3 + Node("2") + val clock4_3 = clock3_3 + Node("1") val clock1_4 = VectorClock() - val clock2_4 = clock1_4.increment(1, System.currentTimeMillis) - val clock3_4 = clock2_4.increment(1, System.currentTimeMillis) - val clock4_4 = clock3_4.increment(3, System.currentTimeMillis) + val clock2_4 = clock1_4 + Node("1") + val clock3_4 = clock2_4 + Node("1") + val clock4_4 = clock3_4 + Node("3") - clock4_3.compare(clock4_4) must be(Concurrent) + clock4_3 <> clock4_4 must be(true) } - ".." in { + "pass misc comparison test 5" in { val clock1_1 = VectorClock() - val clock2_1 = clock1_1.increment(2, System.currentTimeMillis) - val clock3_1 = clock2_1.increment(2, System.currentTimeMillis) + val clock2_1 = clock1_1 + Node("2") + val clock3_1 = clock2_1 + Node("2") val clock1_2 = VectorClock() - val clock2_2 = clock1_2.increment(1, System.currentTimeMillis) - val clock3_2 = clock2_2.increment(2, System.currentTimeMillis) - val clock4_2 = clock3_2.increment(2, System.currentTimeMillis) - val clock5_2 = clock4_2.increment(3, System.currentTimeMillis) + val clock2_2 = clock1_2 + Node("1") + val clock3_2 = clock2_2 + Node("2") + val clock4_2 = clock3_2 + Node("2") + val clock5_2 = clock4_2 + Node("3") - clock3_1.compare(clock5_2) must be(Before) + clock3_1 < clock5_2 must be(true) + clock5_2 > clock3_1 must be(true) } - "..." in { + "pass misc comparison test 6" in { val clock1_1 = VectorClock() - val clock2_1 = clock1_1.increment(1, System.currentTimeMillis) - val clock3_1 = clock2_1.increment(2, System.currentTimeMillis) - val clock4_1 = clock3_1.increment(2, System.currentTimeMillis) - val clock5_1 = clock4_1.increment(3, System.currentTimeMillis) + val clock2_1 = clock1_1 + Node("1") + val clock3_1 = clock2_1 + Node("2") val clock1_2 = VectorClock() - val clock2_2 = clock1_2.increment(2, System.currentTimeMillis) - val clock3_2 = clock2_2.increment(2, System.currentTimeMillis) + val clock2_2 = clock1_2 + Node("1") + val clock3_2 = clock2_2 + Node("1") - clock5_1.compare(clock3_2) must be(After) + clock3_1 <> clock3_2 must be(true) + clock3_2 <> clock3_1 must be(true) + } + + "pass misc comparison test 7" in { + val clock1_1 = VectorClock() + val clock2_1 = clock1_1 + Node("1") + val clock3_1 = clock2_1 + Node("2") + val clock4_1 = clock3_1 + Node("2") + val clock5_1 = clock4_1 + Node("3") + + val clock1_2 = VectorClock() + val clock2_2 = clock1_2 + Node("2") + val clock3_2 = clock2_2 + Node("2") + + clock5_1 <> clock3_2 must be(true) + clock3_2 <> clock5_1 must be(true) + } + + "correctly merge two clocks" in { + val node1 = Node("1") + val node2 = Node("2") + val node3 = Node("3") + + val clock1_1 = VectorClock() + val clock2_1 = clock1_1 + node1 + val clock3_1 = clock2_1 + node2 + val clock4_1 = clock3_1 + node2 + val clock5_1 = clock4_1 + node3 + + val clock1_2 = VectorClock() + val clock2_2 = clock1_2 + node2 + val clock3_2 = clock2_2 + node2 + + val merged1 = clock3_2 merge clock5_1 + merged1.versions.size must be(3) + merged1.versions.contains(node1) must be(true) + merged1.versions.contains(node2) must be(true) + merged1.versions.contains(node3) must be(true) + + val merged2 = clock5_1 merge clock3_2 + merged2.versions.size must be(3) + merged2.versions.contains(node1) must be(true) + merged2.versions.contains(node2) must be(true) + merged2.versions.contains(node3) must be(true) + + clock3_2 < merged1 must be(true) + clock5_1 < merged1 must be(true) + + clock3_2 < merged2 must be(true) + clock5_1 < merged2 must be(true) + + merged1 == merged2 must be(true) + } + + "pass blank clock incrementing" in { + val node1 = Node("1") + val node2 = Node("2") + val node3 = Node("3") + + val v1 = VectorClock() + val v2 = VectorClock() + + val vv1 = v1 + node1 + val vv2 = v2 + node2 + + (vv1 > v1) must equal(true) + (vv2 > v2) must equal(true) + + (vv1 > v2) must equal(true) + (vv2 > v1) must equal(true) + + (vv2 > vv1) must equal(false) + (vv1 > vv2) must equal(false) + } + + "pass merging behavior" in { + val node1 = Node("1") + val node2 = Node("2") + val node3 = Node("3") + + val a = VectorClock() + val b = VectorClock() + + val a1 = a + node1 + val b1 = b + node2 + + var a2 = a1 + node1 + var c = a2.merge(b1) + var c1 = c + node3 + + (c1 > a2) must equal(true) + (c1 > b1) must equal(true) } } - "A Versioned" must { - class TestVersioned(val version: VectorClock = VectorClock()) extends Versioned { - def increment(v: Int, time: Long) = new TestVersioned(version.increment(v, time)) + "An instance of Versioned" must { + class TestVersioned(val version: VectorClock = VectorClock()) extends Versioned[TestVersioned] { + def +(node: Node): TestVersioned = new TestVersioned(version + node) } + import Versioned.latestVersionOf + "have zero versions when created" in { val versioned = new TestVersioned() - versioned.version.versions must be(Vector()) + versioned.version.versions must be(Map()) } "happen before an identical versioned with a single additional event" in { val versioned1_1 = new TestVersioned() - val versioned2_1 = versioned1_1.increment(1, System.currentTimeMillis) - val versioned3_1 = versioned2_1.increment(2, System.currentTimeMillis) - val versioned4_1 = versioned3_1.increment(1, System.currentTimeMillis) + val versioned2_1 = versioned1_1 + Node("1") + val versioned3_1 = versioned2_1 + Node("2") + val versioned4_1 = versioned3_1 + Node("1") val versioned1_2 = new TestVersioned() - val versioned2_2 = versioned1_2.increment(1, System.currentTimeMillis) - val versioned3_2 = versioned2_2.increment(2, System.currentTimeMillis) - val versioned4_2 = versioned3_2.increment(1, System.currentTimeMillis) - val versioned5_2 = versioned4_2.increment(3, System.currentTimeMillis) + val versioned2_2 = versioned1_2 + Node("1") + val versioned3_2 = versioned2_2 + Node("2") + val versioned4_2 = versioned3_2 + Node("1") + val versioned5_2 = versioned4_2 + Node("3") - Versioned.latestVersionOf[TestVersioned](versioned4_1, versioned5_2) must be(versioned5_2) + latestVersionOf[TestVersioned](versioned4_1, versioned5_2) must be(versioned5_2) } - "Two versioneds with different events should be concurrent: 1" in { + "pass misc comparison test 1" in { var versioned1_1 = new TestVersioned() - val versioned2_1 = versioned1_1.increment(1, System.currentTimeMillis) + val versioned2_1 = versioned1_1 + Node("1") val versioned1_2 = new TestVersioned() - val versioned2_2 = versioned1_2.increment(2, System.currentTimeMillis) + val versioned2_2 = versioned1_2 + Node("2") - Versioned.latestVersionOf[TestVersioned](versioned2_1, versioned2_2) must be(versioned2_1) + latestVersionOf[TestVersioned](versioned2_1, versioned2_2) must be(versioned2_2) } - "Two versioneds with different events should be concurrent: 2" in { + "pass misc comparison test 2" in { val versioned1_3 = new TestVersioned() - val versioned2_3 = versioned1_3.increment(1, System.currentTimeMillis) - val versioned3_3 = versioned2_3.increment(2, System.currentTimeMillis) - val versioned4_3 = versioned3_3.increment(1, System.currentTimeMillis) + val versioned2_3 = versioned1_3 + Node("1") + val versioned3_3 = versioned2_3 + Node("2") + val versioned4_3 = versioned3_3 + Node("1") val versioned1_4 = new TestVersioned() - val versioned2_4 = versioned1_4.increment(1, System.currentTimeMillis) - val versioned3_4 = versioned2_4.increment(1, System.currentTimeMillis) - val versioned4_4 = versioned3_4.increment(3, System.currentTimeMillis) + val versioned2_4 = versioned1_4 + Node("1") + val versioned3_4 = versioned2_4 + Node("1") + val versioned4_4 = versioned3_4 + Node("3") - Versioned.latestVersionOf[TestVersioned](versioned4_3, versioned4_4) must be(versioned4_3) + latestVersionOf[TestVersioned](versioned4_3, versioned4_4) must be(versioned4_4) } - "be earlier than another versioned if it has an older version" in { + "pass misc comparison test 3" in { val versioned1_1 = new TestVersioned() - val versioned2_1 = versioned1_1.increment(2, System.currentTimeMillis) - val versioned3_1 = versioned2_1.increment(2, System.currentTimeMillis) + val versioned2_1 = versioned1_1 + Node("2") + val versioned3_1 = versioned2_1 + Node("2") val versioned1_2 = new TestVersioned() - val versioned2_2 = versioned1_2.increment(1, System.currentTimeMillis) - val versioned3_2 = versioned2_2.increment(2, System.currentTimeMillis) - val versioned4_2 = versioned3_2.increment(2, System.currentTimeMillis) - val versioned5_2 = versioned4_2.increment(3, System.currentTimeMillis) + val versioned2_2 = versioned1_2 + Node("1") + val versioned3_2 = versioned2_2 + Node("2") + val versioned4_2 = versioned3_2 + Node("2") + val versioned5_2 = versioned4_2 + Node("3") - Versioned.latestVersionOf[TestVersioned](versioned3_1, versioned5_2) must be(versioned5_2) + latestVersionOf[TestVersioned](versioned3_1, versioned5_2) must be(versioned5_2) } - "be later than another versioned if it has an newer version" in { + "pass misc comparison test 4" in { val versioned1_1 = new TestVersioned() - val versioned2_1 = versioned1_1.increment(1, System.currentTimeMillis) - val versioned3_1 = versioned2_1.increment(2, System.currentTimeMillis) - val versioned4_1 = versioned3_1.increment(2, System.currentTimeMillis) - val versioned5_1 = versioned4_1.increment(3, System.currentTimeMillis) + val versioned2_1 = versioned1_1 + Node("1") + val versioned3_1 = versioned2_1 + Node("2") + val versioned4_1 = versioned3_1 + Node("2") + val versioned5_1 = versioned4_1 + Node("3") val versioned1_2 = new TestVersioned() - val versioned2_2 = versioned1_2.increment(2, System.currentTimeMillis) - val versioned3_2 = versioned2_2.increment(2, System.currentTimeMillis) + val versioned2_2 = versioned1_2 + Node("2") + val versioned3_2 = versioned2_2 + Node("2") - Versioned.latestVersionOf[TestVersioned](versioned5_1, versioned3_2) must be(versioned5_1) + latestVersionOf[TestVersioned](versioned5_1, versioned3_2) must be(versioned3_2) } } } From a2785bc89e54de56fe0bee75834e4a22e2aa61de Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 14 Feb 2012 20:50:12 +0100 Subject: [PATCH 43/72] Finalized initial cluster membership and merging of vector clocks and gossips in case of concurrent cluster updates. Plus misc other fixes. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Finalized initial cluster membership. * Added merging of vector clocks and gossips in case of concurrent cluster updates. * Added toString methods to all cluster protocol classes * Fixed bugs in incrementation of vector clocks * Added updates of 'seen' table for cluster convergence * Revamped to use new VectorClock impl * Refactored Gossip.State Signed-off-by: Jonas Bonér --- .../akka/cluster/AccrualFailureDetector.scala | 4 +- .../main/scala/akka/cluster/Gossiper.scala | 309 +++++++++++------- 2 files changed, 195 insertions(+), 118 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala index 379bf98a6b..cebc518bcf 100644 --- a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala +++ b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala @@ -145,7 +145,9 @@ class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val ma val mean = oldState.failureStats.get(connection).getOrElse(FailureStats()).mean PhiFactor * timestampDiff / mean } - log.debug("Phi value [{}] and threshold [{}] for connection [{}] ", phi, threshold, connection) + + // FIXME sometimes we get "Phi value [Infinity]" fix it + if (phi > 0.0) log.debug("Phi value [{}] and threshold [{}] for connection [{}] ", phi, threshold, connection) // only log if PHI value is starting to get interesting phi } diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 783690a249..480d6c7461 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -17,7 +17,6 @@ import java.util.concurrent.atomic.{ AtomicReference, AtomicBoolean } import java.util.concurrent.TimeUnit._ import java.util.concurrent.TimeoutException import java.security.SecureRandom -import System.{ currentTimeMillis ⇒ newTimestamp } import scala.collection.immutable.{ Map, SortedSet } import scala.annotation.tailrec @@ -104,11 +103,17 @@ object MemberStatus { // status: PartitioningStatus) /** - * Represents the overview of the cluster, holds the cluster convergence table and unreachable nodes. + * Represents the overview of the cluster, holds the cluster convergence table and set with unreachable nodes. */ case class GossipOverview( seen: Map[Address, VectorClock] = Map.empty[Address, VectorClock], - unreachable: Set[Address] = Set.empty[Address]) + unreachable: Set[Address] = Set.empty[Address]) { + + override def toString = + "GossipOverview(seen = [" + seen.mkString(", ") + + "], unreachable = [" + unreachable.mkString(", ") + + "])" +} /** * Represents the state of the cluster; cluster ring membership, ring convergence, meta data - all versioned by a vector clock. @@ -121,9 +126,14 @@ case class Gossip( meta: Map[String, Array[Byte]] = Map.empty[String, Array[Byte]], version: VectorClock = VectorClock()) // vector clock version extends ClusterMessage // is a serializable cluster message - with Versioned { + with Versioned[Gossip] { - def addMember(member: Member): Gossip = { + /** + * Increments the version for this 'Node'. + */ + def +(node: VectorClock.Node): Gossip = copy(version = version + node) + + def +(member: Member): Gossip = { if (members contains member) this else this copy (members = members + member) } @@ -132,14 +142,19 @@ case class Gossip( * Marks the gossip as seen by this node (remoteAddress) by updating the address entry in the 'gossip.overview.seen' * Map with the VectorClock for the new gossip. */ - def markAsSeenByThisNode(address: Address): Gossip = + def seen(address: Address): Gossip = this copy (overview = overview copy (seen = overview.seen + (address -> version))) - def incrementVersion(memberFingerprint: Int): Gossip = { - this copy (version = version.increment(memberFingerprint, newTimestamp)) - } + override def toString = + "Gossip(" + + "overview = " + overview + + ", members = [" + members.mkString(", ") + + "], meta = [" + meta.mkString(", ") + + "], version = " + version + + ")" } +// FIXME add FSM trait? final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { val log = Logging(system, "ClusterDaemon") @@ -153,6 +168,9 @@ final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor } } +// FIXME Cluster public API should be an Extension +// FIXME Add cluster Node class and refactor out all non-gossip related stuff out of Gossiper + /** * This module is responsible for Gossiping cluster information. The abstraction maintains the list of live * and dead members. Periodically i.e. every 1 second this module chooses a random member and initiates a round @@ -177,19 +195,18 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private case class State( self: Member, latestGossip: Gossip, - isSingletonCluster: Boolean = true, // starts as singleton cluster memberMembershipChangeListeners: Set[MembershipChangeListener] = Set.empty[MembershipChangeListener]) val remoteSettings = new RemoteSettings(system.settings.config, system.name) val clusterSettings = new ClusterSettings(system.settings.config, system.name) val remoteAddress = remote.transport.address - val memberFingerprint = remoteAddress.## + val selfNode = VectorClock.Node(remoteAddress.toString) val gossipInitialDelay = clusterSettings.GossipInitialDelay val gossipFrequency = clusterSettings.GossipFrequency - implicit val memberOrdering = Ordering.fromLessThan[Member](_.address.toString > _.address.toString) + implicit val memberOrdering = Ordering.fromLessThan[Member](_.address.toString < _.address.toString) implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) @@ -204,53 +221,38 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val random = SecureRandom.getInstance("SHA1PRNG") // Is it right to put this guy under the /system path or should we have a top-level /cluster or something else...? + // FIXME should be defined as a router so we get concurrency here private val clusterDaemon = system.systemActorOf(Props(new ClusterDaemon(system, this)), "cluster") private val state = { val member = Member(remoteAddress, MemberStatus.Joining) - val gossip = Gossip(members = SortedSet.empty[Member] + member) + val gossip = Gossip(members = SortedSet.empty[Member] + member) + selfNode // add me as member and update my vector clock new AtomicReference[State](State(member, gossip)) } // FIXME manage connections in some other way so we can delete the RemoteConnectionManager (SINCE IT SUCKS!!!) private val connectionManager = new RemoteConnectionManager(system, remote, failureDetector, Map.empty[Address, ActorRef]) + import Versioned.latestVersionOf + log.info("Node [{}] - Starting cluster Gossiper...", remoteAddress) // try to join the node defined in the 'akka.cluster.node-to-join' option - nodeToJoin match { - case None ⇒ switchStatusTo(MemberStatus.Up) // if we are singleton cluster then we are already considered to be UP - case Some(address) ⇒ join(address) - } + nodeToJoin foreach join - // start periodic gossip and cluster scrutinization + // start periodic gossip to random nodes in cluster val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { gossip() } + + // start periodic cluster scrutinization (moving nodes condemned by the failure detector to unreachable list) val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { scrutinize() } - /** - * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. - */ - def shutdown() { - - // FIXME Cheating. Can't just shut down. Node must first gossip an Leave command, wait for Leader to do proper Handoff and then await an Exit command before switching to Removed - - if (isRunning.compareAndSet(true, false)) { - log.info("Node [{}] - Shutting down Gossiper and ClusterDaemon", remoteAddress) - try connectionManager.shutdown() finally { - try system.stop(clusterDaemon) finally { - try gossipCanceller.cancel() finally { - try scrutinizeCanceller.cancel() finally { - log.info("Node [{}] - Gossiper is shut down", remoteAddress) - } - } - } - } - } - } + // ====================================================== + // ===================== PUBLIC API ===================== + // ====================================================== /** * Latest gossip. @@ -265,52 +267,90 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Is this node a singleton cluster? */ - def isSingletonCluster: Boolean = state.get.isSingletonCluster + def isSingletonCluster: Boolean = isSingletonCluster(state.get) + + /** + * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. + */ + def shutdown() { + + // FIXME Cheating for now. Can't just shut down. Node must first gossip an Leave command, wait for Leader to do proper Handoff and then await an Exit command before switching to Removed + + if (isRunning.compareAndSet(true, false)) { + log.info("Node [{}] - Shutting down Gossiper and ClusterDaemon...", remoteAddress) + + try connectionManager.shutdown() finally { + try system.stop(clusterDaemon) finally { + try gossipCanceller.cancel() finally { + try scrutinizeCanceller.cancel() finally { + log.info("Node [{}] - Gossiper and ClusterDaemon shut down successfully", remoteAddress) + } + } + } + } + } + } /** * New node joining. */ @tailrec final def joining(node: Address) { - log.debug("Node [{}] - Node [{}] is joining", remoteAddress, node) - val oldState = state.get - val oldGossip = oldState.latestGossip - val oldMembers = oldGossip.members - val newGossip = oldGossip copy (members = oldMembers + Member(node, MemberStatus.Joining)) // add joining node as Joining - val newState = oldState copy (latestGossip = newGossip.incrementVersion(memberFingerprint)) + log.info("Node [{}] - Node [{}] is joining", remoteAddress, node) - // FIXME set flag state.isSingletonCluster = false (if true) + val localState = state.get + val localGossip = localState.latestGossip + val localMembers = localGossip.members - if (!state.compareAndSet(oldState, newState)) joining(node) // recur if we failed update + val newMembers = localMembers + Member(node, MemberStatus.Joining) // add joining node as Joining + val newGossip = localGossip copy (members = newMembers) + + val versionedGossip = newGossip + selfNode + val seenVersionedGossip = versionedGossip seen remoteAddress + + val newState = localState copy (latestGossip = seenVersionedGossip) + + if (!state.compareAndSet(localState, newState)) joining(node) // recur if we failed update } /** * Receive new gossip. */ @tailrec - final def receive(sender: Member, newGossip: Gossip) { + final def receive(sender: Member, remoteGossip: Gossip) { log.debug("Node [{}] - Receiving gossip from [{}]", remoteAddress, sender.address) failureDetector heartbeat sender.address // update heartbeat in failure detector - // FIXME set flag state.isSingletonCluster = false (if true) - // FIXME check for convergence - if we have convergence then trigger the listeners - val oldState = state.get - val oldGossip = oldState.latestGossip + val localState = state.get + val localGossip = localState.latestGossip - val gossip = Versioned - .latestVersionOf(newGossip, oldGossip) - .addMember(self) // needed if newGossip won - .addMember(sender) // needed if oldGossip won - .markAsSeenByThisNode(remoteAddress) - .incrementVersion(memberFingerprint) + val winningGossip = + if (remoteGossip.version <> localGossip.version) { + // concurrent + val mergedGossip = merge(remoteGossip, localGossip) + val versionedMergedGossip = mergedGossip + selfNode - val newState = oldState copy (latestGossip = gossip) + log.debug("Can't establish a causal relationship between \"remote\" gossip [{}] and \"local\" gossip [{}] - merging them into [{}]", + remoteGossip, localGossip, versionedMergedGossip) + + versionedMergedGossip + + } else if (remoteGossip.version < localGossip.version) { + // local gossip is newer + localGossip + + } else { + // remote gossip is newer + remoteGossip + } + + val newState = localState copy (latestGossip = winningGossip seen remoteAddress) // if we won the race then update else try again - if (!state.compareAndSet(oldState, newState)) receive(sender, newGossip) // recur if we fail the update + if (!state.compareAndSet(localState, newState)) receive(sender, remoteGossip) // recur if we fail the update } /** @@ -318,10 +358,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ @tailrec final def registerListener(listener: MembershipChangeListener) { - val oldState = state.get - val newListeners = oldState.memberMembershipChangeListeners + listener - val newState = oldState copy (memberMembershipChangeListeners = newListeners) - if (!state.compareAndSet(oldState, newState)) registerListener(listener) // recur + val localState = state.get + val newListeners = localState.memberMembershipChangeListeners + listener + val newState = localState copy (memberMembershipChangeListeners = newListeners) + if (!state.compareAndSet(localState, newState)) registerListener(listener) // recur } /** @@ -329,12 +369,16 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ @tailrec final def unregisterListener(listener: MembershipChangeListener) { - val oldState = state.get - val newListeners = oldState.memberMembershipChangeListeners - listener - val newState = oldState copy (memberMembershipChangeListeners = newListeners) - if (!state.compareAndSet(oldState, newState)) unregisterListener(listener) // recur + val localState = state.get + val newListeners = localState.memberMembershipChangeListeners - listener + val newState = localState copy (memberMembershipChangeListeners = newListeners) + if (!state.compareAndSet(localState, newState)) unregisterListener(listener) // recur } + // ======================================================== + // ===================== INTERNAL API ===================== + // ======================================================== + /** * Joins the pre-configured contact point and retrieves current gossip state. */ @@ -360,69 +404,90 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * Initates a new round of gossip. */ private def gossip() { - val oldState = state.get - if (!oldState.isSingletonCluster) { // do not gossip if we are a singleton cluster - val oldGossip = oldState.latestGossip - val oldMembers = oldGossip.members - val oldMembersSize = oldMembers.size + val localState = state.get + val localGossip = localState.latestGossip + val localMembers = localGossip.members - val oldUnreachableAddresses = oldGossip.overview.unreachable - val oldUnreachableSize = oldUnreachableAddresses.size + if (!isSingletonCluster(localState)) { // do not gossip if we are a singleton cluster + log.debug("Node [{}] - Initiating new round of gossip", remoteAddress) + + val localGossip = localState.latestGossip + val localMembers = localGossip.members + val localMembersSize = localMembers.size + + val localUnreachableAddresses = localGossip.overview.unreachable + val localUnreachableSize = localUnreachableAddresses.size // 1. gossip to alive members - val gossipedToDeputy = - if (oldUnreachableSize > 0) gossipToRandomNodeOf(oldMembers.toList map { _.address }) - else false + val gossipedToDeputy = gossipToRandomNodeOf(localMembers.toList map { _.address }) // 2. gossip to unreachable members - if (oldUnreachableSize > 0) { - val probability: Double = oldUnreachableSize / (oldMembersSize + 1) - if (random.nextDouble() < probability) gossipToRandomNodeOf(oldUnreachableAddresses.toList) + if (localUnreachableSize > 0) { + val probability: Double = localUnreachableSize / (localMembersSize + 1) + if (random.nextDouble() < probability) gossipToRandomNodeOf(localUnreachableAddresses.toList) } // 3. gossip to a deputy nodes for facilitating partition healing val deputies = deputyNodesWithoutMyself - if ((!gossipedToDeputy || oldMembersSize < 1) && !deputies.isEmpty) { - if (oldMembersSize == 0) gossipToRandomNodeOf(deputies) + if ((!gossipedToDeputy || localMembersSize < 1) && !deputies.isEmpty) { + if (localMembersSize == 0) gossipToRandomNodeOf(deputies) else { - val probability = 1.0 / oldMembersSize + oldUnreachableSize + val probability = 1.0 / localMembersSize + localUnreachableSize if (random.nextDouble() <= probability) gossipToRandomNodeOf(deputies) } } } } + /** + * Merges two Gossip instances including membership tables, meta-data tables and the VectorClock histories. + */ + private def merge(gossip1: Gossip, gossip2: Gossip): Gossip = { + val mergedVClock = gossip1.version merge gossip2.version + val mergedMembers = gossip1.members union gossip2.members + val mergedMeta = gossip1.meta ++ gossip2.meta + Gossip(gossip2.overview, mergedMembers, mergedMeta, mergedVClock) + } + /** * Switches the state in the FSM. */ @tailrec final private def switchStatusTo(newStatus: MemberStatus) { log.info("Node [{}] - Switching membership status to [{}]", remoteAddress, newStatus) - val oldState = state.get - val oldSelf = oldState.self - val oldGossip = oldState.latestGossip - val oldMembers = oldGossip.members + val localState = state.get + val localSelf = localState.self - val newSelf = oldSelf copy (status = newStatus) + val localGossip = localState.latestGossip + val localMembers = localGossip.members - val newMembersSet = oldMembers map { member ⇒ + val newSelf = localSelf copy (status = newStatus) + val newMembersSet = localMembers map { member ⇒ if (member.address == remoteAddress) newSelf else member } + // ugly crap to work around bug in scala colletions ('val ss: SortedSet[Member] = SortedSet.empty[Member] ++ aSet' does not compile) val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*) + val newGossip = localGossip copy (members = newMembersSortedSet) - val newGossip = oldGossip copy (members = newMembersSortedSet) incrementVersion memberFingerprint - val newState = oldState copy (self = newSelf, latestGossip = newGossip) - if (!state.compareAndSet(oldState, newState)) switchStatusTo(newStatus) // recur if we failed update + val versionedGossip = newGossip + selfNode + val seenVersionedGossip = versionedGossip seen remoteAddress + + val newState = localState copy (self = newSelf, latestGossip = seenVersionedGossip) + + if (!state.compareAndSet(localState, newState)) switchStatusTo(newStatus) // recur if we failed update } /** * Gossips latest gossip to an address. */ private def gossipTo(address: Address) { - setUpConnectionTo(address) foreach { _ ! GossipEnvelope(self, latestGossip) } + setUpConnectionTo(address) foreach { connection ⇒ + log.debug("Node [{}] - Gossiping to [{}]", remoteAddress, address) + connection ! GossipEnvelope(self, latestGossip) + } } /** @@ -433,8 +498,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private def gossipToRandomNodeOf(addresses: Seq[Address]): Boolean = { val peers = addresses filter (_ != remoteAddress) // filter out myself val peer = selectRandomNode(peers) - val oldState = state.get - val oldGossip = oldState.latestGossip + val localState = state.get + val localGossip = localState.latestGossip // if connection can't be established/found => ignore it since the failure detector will take care of the potential problem gossipTo(peer) deputyNodesWithoutMyself exists (peer == _) @@ -445,32 +510,39 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ @tailrec final private def scrutinize() { - val oldState = state.get - if (!oldState.isSingletonCluster) { // do not scrutinize if we are a singleton cluster - val oldGossip = oldState.latestGossip - val oldOverview = oldGossip.overview - val oldMembers = oldGossip.members - val oldUnreachableAddresses = oldGossip.overview.unreachable + val localState = state.get - val newlyDetectedUnreachableMembers = oldMembers filterNot { member ⇒ failureDetector.isAvailable(member.address) } + if (!isSingletonCluster(localState)) { // do not scrutinize if we are a singleton cluster + + val localGossip = localState.latestGossip + val localOverview = localGossip.overview + val localMembers = localGossip.members + val localUnreachableAddresses = localGossip.overview.unreachable + + val newlyDetectedUnreachableMembers = localMembers filterNot { member ⇒ failureDetector.isAvailable(member.address) } val newlyDetectedUnreachableAddresses = newlyDetectedUnreachableMembers map { _.address } if (!newlyDetectedUnreachableAddresses.isEmpty) { // we have newly detected members marked as unavailable - val newMembers = oldMembers diff newlyDetectedUnreachableMembers - val newUnreachableAddresses: Set[Address] = (oldUnreachableAddresses ++ newlyDetectedUnreachableAddresses) - val newOverview = oldOverview copy (unreachable = newUnreachableAddresses) - val newGossip = oldGossip copy (overview = newOverview, members = newMembers) incrementVersion memberFingerprint - val newState = oldState copy (latestGossip = newGossip) + val newMembers = localMembers diff newlyDetectedUnreachableMembers + val newUnreachableAddresses: Set[Address] = (localUnreachableAddresses ++ newlyDetectedUnreachableAddresses) + + val newOverview = localOverview copy (unreachable = newUnreachableAddresses) + val newGossip = localGossip copy (overview = newOverview, members = newMembers) + + val versionedGossip = newGossip + selfNode + val seenVersionedGossip = versionedGossip seen remoteAddress + + val newState = localState copy (latestGossip = seenVersionedGossip) // if we won the race then update else try again - if (!state.compareAndSet(oldState, newState)) scrutinize() // recur + if (!state.compareAndSet(localState, newState)) scrutinize() // recur else { // FIXME should only notify when there is a cluster convergence // notify listeners on successful update of state // for { // deadNode ← newUnreachableAddresses - // listener ← oldState.memberMembershipChangeListeners + // listener ← localState.memberMembershipChangeListeners // } listener memberDisconnected deadNode } } @@ -481,14 +553,16 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { @tailrec final private def connectToRandomNodeOf(addresses: Seq[Address]): ActorRef = { addresses match { + case address :: rest ⇒ setUpConnectionTo(address) match { case Some(connection) ⇒ connection - case None ⇒ connectToRandomNodeOf(rest) // recur if + case None ⇒ connectToRandomNodeOf(rest) // recur - if we could not set up a connection - try next address } + case Nil ⇒ throw new RemoteConnectionException( - "Could not establish connection to any of the addresses in the argument list") + "Could not establish connection to any of the addresses in the argument list [" + addresses.mkString(", ") + "]") } } @@ -500,15 +574,16 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Sets up remote connection. */ - private def setUpConnectionTo(address: Address): Option[ActorRef] = { - try { - Some(connectionManager.putIfAbsent(address, () ⇒ system.actorFor(RootActorPath(address) / "system" / "cluster"))) - } catch { - case e: Exception ⇒ None + private def setUpConnectionTo(address: Address): Option[ActorRef] = Option { + // FIXME no need for using a factory here - remove connectionManager + try connectionManager.putIfAbsent(address, () ⇒ system.actorFor(RootActorPath(address) / "system" / "cluster")) catch { + case e: Exception ⇒ null } } private def deputyNodesWithoutMyself: Seq[Address] = Seq.empty[Address] filter (_ != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq private def selectRandomNode(addresses: Seq[Address]): Address = addresses(random nextInt addresses.size) + + private def isSingletonCluster(currentState: State): Boolean = currentState.latestGossip.members.size == 1 } From b36a6987f6e148431c25bb1a6e5f9499e2fdafc2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 14 Feb 2012 20:53:46 +0100 Subject: [PATCH 44/72] Renamed NodeGossipingSpec to NodeMembershipSpec since it is testing consistency of the cluster node membership table MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- ...eGossipingSpec.scala => NodeMembershipSpec.scala} | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) rename akka-cluster/src/test/scala/akka/cluster/{NodeGossipingSpec.scala => NodeMembershipSpec.scala} (91%) diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala similarity index 91% rename from akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala rename to akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index a3cc492a23..f25a7d60a7 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeGossipingSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -12,7 +12,7 @@ import akka.remote._ import com.typesafe.config._ -class NodeGossipingSpec extends AkkaSpec(""" +class NodeMembershipSpec extends AkkaSpec(""" akka { loglevel = "DEBUG" } @@ -29,7 +29,7 @@ class NodeGossipingSpec extends AkkaSpec(""" try { "A set of connected cluster nodes" must { "(when two nodes) start gossiping to each other so that both nodes gets the same gossip info" in { - node0 = ActorSystem("NodeGossipingSpec", ConfigFactory + node0 = ActorSystem("NodeMembershipSpec", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -43,7 +43,7 @@ class NodeGossipingSpec extends AkkaSpec(""" val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] gossiper0 = Gossiper(node0, remote0) - node1 = ActorSystem("NodeGossipingSpec", ConfigFactory + node1 = ActorSystem("NodeMembershipSpec", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -51,7 +51,7 @@ class NodeGossipingSpec extends AkkaSpec(""" hostname = localhost port=5551 } - cluster.node-to-join = "akka://NodeGossipingSpec@localhost:5550" + cluster.node-to-join = "akka://NodeMembershipSpec@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] @@ -76,7 +76,7 @@ class NodeGossipingSpec extends AkkaSpec(""" } "(when three nodes) start gossiping to each other so that both nodes gets the same gossip info" in { - node2 = ActorSystem("NodeGossipingSpec", ConfigFactory + node2 = ActorSystem("NodeMembershipSpec", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -84,7 +84,7 @@ class NodeGossipingSpec extends AkkaSpec(""" hostname = localhost port=5552 } - cluster.node-to-join = "akka://NodeGossipingSpec@localhost:5550" + cluster.node-to-join = "akka://NodeMembershipSpec@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] From 709c86b48d0a8f1c6ea10448a9947e6a6a0f9068 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 14 Feb 2012 22:01:59 +0100 Subject: [PATCH 45/72] Disabling out erroneous cluster 'scrutinize' service until fixed and proper tests are written. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- akka-cluster/src/main/scala/akka/cluster/Gossiper.scala | 4 +++- .../src/test/scala/akka/cluster/NodeStartupSpec.scala | 3 ++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 480d6c7461..7dfb65b193 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -247,7 +247,9 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // start periodic cluster scrutinization (moving nodes condemned by the failure detector to unreachable list) val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { - scrutinize() + + // FIXME fix problems with 'scrutinize' + //scrutinize() } // ====================================================== diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index de59541dfa..1f5ff985db 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -47,7 +47,7 @@ class NodeStartupSpec extends AkkaSpec(""" val members = gossiper0.latestGossip.members val joiningMember = members find (_.address.port.get == 5550) joiningMember must be('defined) - joiningMember.get.status must be(MemberStatus.Up) + joiningMember.get.status must be(MemberStatus.Joining) } } @@ -84,6 +84,7 @@ class NodeStartupSpec extends AkkaSpec(""" override def atTermination() { gossiper0.shutdown() node0.shutdown() + gossiper1.shutdown() node1.shutdown() } From 07defa71a419872ab72f7c3e832cbaa51279422b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 15 Feb 2012 15:51:27 +0100 Subject: [PATCH 46/72] Fixed bug in failure detector which also fixes bug in cluster scrutinize service. Also added test case for the bug. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/AccrualFailureDetector.scala | 7 ++++--- akka-cluster/src/main/scala/akka/cluster/Gossiper.scala | 6 ++---- .../scala/akka/cluster/AccrualFailureDetectorSpec.scala | 9 ++++++++- 3 files changed, 14 insertions(+), 8 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala index cebc518bcf..8ee9f857a0 100644 --- a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala +++ b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala @@ -143,11 +143,12 @@ class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val ma else { val timestampDiff = newTimestamp - oldTimestamp.get val mean = oldState.failureStats.get(connection).getOrElse(FailureStats()).mean - PhiFactor * timestampDiff / mean + if (mean == 0.0D) 0.0D + else PhiFactor * timestampDiff / mean } - // FIXME sometimes we get "Phi value [Infinity]" fix it - if (phi > 0.0) log.debug("Phi value [{}] and threshold [{}] for connection [{}] ", phi, threshold, connection) // only log if PHI value is starting to get interesting + // only log if PHI value is starting to get interesting + if (phi > 0.0D) log.debug("Phi value [{}] and threshold [{}] for connection [{}] ", phi, threshold, connection) phi } diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 7dfb65b193..f37c9294de 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -247,9 +247,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // start periodic cluster scrutinization (moving nodes condemned by the failure detector to unreachable list) val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { - - // FIXME fix problems with 'scrutinize' - //scrutinize() + scrutinize() } // ====================================================== @@ -527,7 +525,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (!newlyDetectedUnreachableAddresses.isEmpty) { // we have newly detected members marked as unavailable val newMembers = localMembers diff newlyDetectedUnreachableMembers - val newUnreachableAddresses: Set[Address] = (localUnreachableAddresses ++ newlyDetectedUnreachableAddresses) + val newUnreachableAddresses: Set[Address] = localUnreachableAddresses ++ newlyDetectedUnreachableAddresses val newOverview = localOverview copy (unreachable = newUnreachableAddresses) val newGossip = localGossip copy (overview = newOverview, members = newMembers) diff --git a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala index 4aab105273..5f93a8ddf2 100644 --- a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala @@ -9,7 +9,14 @@ class AccrualFailureDetectorSpec extends AkkaSpec(""" """) { "An AccrualFailureDetector" must { - val conn = Address("akka", "", "localhost", 2552) + val conn = Address("akka", "", Some("localhost"), Some(2552)) + val conn2 = Address("akka", "", Some("localhost"), Some(2553)) + + "return phi value of 0.0D on startup for each address" in { + val fd = new AccrualFailureDetector(system) + fd.phi(conn) must be(0.0D) + fd.phi(conn2) must be(0.0D) + } "mark node as available after a series of successful heartbeats" in { val fd = new AccrualFailureDetector(system) From 84f886def1f1b6bb5af12e4f6feada2d1e3ea75c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Thu, 16 Feb 2012 11:20:51 +0100 Subject: [PATCH 47/72] Merged with master --- .../src/main/scala/akka/cluster/Gossiper.scala | 16 +++------------- .../cluster/AccrualFailureDetectorSpec.scala | 4 ++-- 2 files changed, 5 insertions(+), 15 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index f37c9294de..bb1e19e746 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -238,7 +238,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { log.info("Node [{}] - Starting cluster Gossiper...", remoteAddress) // try to join the node defined in the 'akka.cluster.node-to-join' option - nodeToJoin foreach join + join() // start periodic gossip to random nodes in cluster val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { @@ -382,22 +382,12 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** * Joins the pre-configured contact point and retrieves current gossip state. */ - private def join(address: Address) { + private def join() = nodeToJoin foreach { address ⇒ setUpConnectionTo(address) foreach { connection ⇒ val command = Join(remoteAddress) - log.info("Node [{}] - Sending [{}] to [{}]", remoteAddress, command, address) + log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, address, connection) connection ! command } - - contactPoint match { - case None ⇒ log.info("Booting up in singleton cluster mode") - case Some(member) ⇒ - log.info("Trying to join contact point node defined in the configuration [{}]", member) - setUpConnectionTo(member) match { - case None ⇒ log.error("Could not set up connection to join contact point node defined in the configuration [{}]", member) - case Some(connection) ⇒ tryJoinContactPoint(connection, deadline) - } - } } /** diff --git a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala index 5f93a8ddf2..034f582e0d 100644 --- a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala @@ -9,8 +9,8 @@ class AccrualFailureDetectorSpec extends AkkaSpec(""" """) { "An AccrualFailureDetector" must { - val conn = Address("akka", "", Some("localhost"), Some(2552)) - val conn2 = Address("akka", "", Some("localhost"), Some(2553)) + val conn = Address("akka", "", "localhost", 2552) + val conn2 = Address("akka", "", "localhost", 2553) "return phi value of 0.0D on startup for each address" in { val fd = new AccrualFailureDetector(system) From bc70db4bb0fe282a2d42d623f2057f25333e3409 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Thu, 16 Feb 2012 14:48:40 +0100 Subject: [PATCH 48/72] Fixed error in merge. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../scala/akka/cluster/NodeStartupSpec.scala | 59 ++++++++++--------- 1 file changed, 30 insertions(+), 29 deletions(-) diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index 1f5ff985db..6ccb8491c1 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -24,22 +24,23 @@ class NodeStartupSpec extends AkkaSpec(""" var node1: ActorSystemImpl = _ try { - node0 = ActorSystem("NodeStartupSpec", ConfigFactory - .parseString(""" - akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5550 - } - }""") - .withFallback(system.settings.config)) - .asInstanceOf[ActorSystemImpl] - val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] - gossiper0 = Gossiper(node0, remote0) - "A first cluster node with a 'node-to-join' config set to empty string (singleton cluster)" must { + node0 = ActorSystem("NodeStartupSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] + gossiper0 = Gossiper(node0, remote0) + "be a singleton cluster when started up" in { + Thread.sleep(1000) gossiper0.isSingletonCluster must be(true) } @@ -51,23 +52,23 @@ class NodeStartupSpec extends AkkaSpec(""" } } - node1 = ActorSystem("NodeStartupSpec", ConfigFactory - .parseString(""" - akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5551 - } - cluster.node-to-join = "akka://NodeStartupSpec@localhost:5550" - }""") - .withFallback(system.settings.config)) - .asInstanceOf[ActorSystemImpl] - val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] - gossiper1 = Gossiper(node1, remote1) - "A second cluster node with a 'node-to-join' config defined" must { "join the other node cluster as 'Joining' when sending a Join command" in { + node1 = ActorSystem("NodeStartupSpec", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5551 + } + cluster.node-to-join = "akka://NodeStartupSpec@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] + gossiper1 = Gossiper(node1, remote1) + Thread.sleep(1000) // give enough time for node1 to JOIN node0 val members = gossiper0.latestGossip.members val joiningMember = members find (_.address.port.get == 5551) From c24485040d1215082e8ce0ede2ac14f381a2dd29 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sat, 18 Feb 2012 17:40:58 +0100 Subject: [PATCH 49/72] Removed printed stack trace from remote client/server errors. Just annoying when client hangs retrying and does not provide any real value since they are the same every time. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/actor/ActorCell.scala | 2 +- .../scala/akka/actor/ActorRefProvider.scala | 10 ++++----- .../scala/akka/remote/RemoteTransport.scala | 21 +++++++++++++++---- 3 files changed, 23 insertions(+), 10 deletions(-) diff --git a/akka-actor/src/main/scala/akka/actor/ActorCell.scala b/akka-actor/src/main/scala/akka/actor/ActorCell.scala index e659928743..22c82bfb6c 100644 --- a/akka-actor/src/main/scala/akka/actor/ActorCell.scala +++ b/akka-actor/src/main/scala/akka/actor/ActorCell.scala @@ -294,7 +294,7 @@ private[akka] class ActorCell( final def start(): Unit = { /* - * Create the mailbox and enqueue the Create() message to ensure that + * Create the mailbox and enqueue the Create() message to ensure that * this is processed before anything else. */ mailbox = dispatcher.createMailbox(this) diff --git a/akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala b/akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala index 255a42d87c..5fb8936d4d 100644 --- a/akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala +++ b/akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala @@ -521,30 +521,30 @@ class LocalActorRefProvider( def actorFor(ref: InternalActorRef, path: String): InternalActorRef = path match { case RelativeActorPath(elems) ⇒ if (elems.isEmpty) { - log.debug("look-up of empty path string '{}' fails (per definition)", path) + log.debug("look-up of empty path string [{}] fails (per definition)", path) deadLetters } else if (elems.head.isEmpty) actorFor(rootGuardian, elems.tail) else actorFor(ref, elems) case ActorPathExtractor(address, elems) if address == rootPath.address ⇒ actorFor(rootGuardian, elems) case _ ⇒ - log.debug("look-up of unknown path '{}' failed", path) + log.warning("look-up of unknown path [{}] failed", path) deadLetters } def actorFor(path: ActorPath): InternalActorRef = if (path.root == rootPath) actorFor(rootGuardian, path.elements) else { - log.debug("look-up of foreign ActorPath '{}' failed", path) + log.warning("look-up of foreign ActorPath [{}] failed", path) deadLetters } def actorFor(ref: InternalActorRef, path: Iterable[String]): InternalActorRef = if (path.isEmpty) { - log.debug("look-up of empty path sequence fails (per definition)") + log.warning("look-up of empty path sequence fails (per definition)") deadLetters } else ref.getChild(path.iterator) match { case Nobody ⇒ - log.debug("look-up of path sequence '{}' failed", path) + log.warning("look-up of path sequence [{}] failed", path) new EmptyLocalActorRef(system.provider, ref.path / path, eventStream) case x ⇒ x } diff --git a/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala b/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala index 0a5088adcd..07a910388b 100644 --- a/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala +++ b/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala @@ -32,7 +32,7 @@ case class RemoteClientError( @BeanProperty remoteAddress: Address) extends RemoteClientLifeCycleEvent { override def logLevel = Logging.ErrorLevel override def toString = - "RemoteClientError@" + remoteAddress + ": Error[" + AkkaException.toStringWithStackTrace(cause) + "]" + "RemoteClientError@" + remoteAddress + ": Error[" + cause + "]" } case class RemoteClientDisconnected( @@ -76,7 +76,7 @@ case class RemoteClientWriteFailed( override def toString = "RemoteClientWriteFailed@" + remoteAddress + ": MessageClass[" + (if (request ne null) request.getClass.getName else "no message") + - "] Error[" + AkkaException.toStringWithStackTrace(cause) + "]" + "] Error[" + cause + "]" } /** @@ -103,7 +103,7 @@ case class RemoteServerError( @BeanProperty remote: RemoteTransport) extends RemoteServerLifeCycleEvent { override def logLevel = Logging.ErrorLevel override def toString = - "RemoteServerError@" + remote + "] Error[" + AkkaException.toStringWithStackTrace(cause) + "]" + "RemoteServerError@" + remote + "] Error[" + cause + "]" } case class RemoteServerClientConnected( @@ -133,6 +133,19 @@ case class RemoteServerClientClosed( ": Client[" + clientAddress.getOrElse("no address") + "]" } +case class RemoteServerWriteFailed( + @BeanProperty request: AnyRef, + @BeanProperty cause: Throwable, + @BeanProperty remote: RemoteTransport, + @BeanProperty remoteAddress: Option[Address]) extends RemoteServerLifeCycleEvent { + override def logLevel = Logging.WarningLevel + override def toString = + "RemoteServerWriteFailed@" + remote + + ": ClientAddress[" + remoteAddress + + "] MessageClass[" + (if (request ne null) request.getClass.getName else "no message") + + "] Error[" + cause + "]" +} + /** * Thrown for example when trying to send a message using a RemoteClient that is either not started or shut down. */ @@ -190,7 +203,7 @@ abstract class RemoteTransport { protected[akka] def notifyListeners(message: RemoteLifeCycleEvent): Unit = { system.eventStream.publish(message) - system.log.log(message.logLevel, "REMOTE: {}", message) + system.log.log(message.logLevel, "{}", message) } override def toString = address.toString From db1e1da7e7c7cd1e9f89b674772b25217788b442 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sat, 18 Feb 2012 17:45:21 +0100 Subject: [PATCH 50/72] Added testkit time ratio sensitive durations. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/test/scala/akka/cluster/NodeMembershipSpec.scala | 9 +++++---- .../src/test/scala/akka/cluster/NodeStartupSpec.scala | 5 +++-- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index f25a7d60a7..a2106fc6da 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -1,5 +1,5 @@ /** - * Copyright (C) 2009-2011 Typesafe Inc. + * Copyright (C) 2009-2012 Typesafe Inc. */ package akka.cluster @@ -9,12 +9,13 @@ import akka.testkit._ import akka.dispatch._ import akka.actor._ import akka.remote._ +import akka.util.duration._ import com.typesafe.config._ class NodeMembershipSpec extends AkkaSpec(""" akka { - loglevel = "DEBUG" + loglevel = "INFO" } """) with ImplicitSender { @@ -58,7 +59,7 @@ class NodeMembershipSpec extends AkkaSpec(""" val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] gossiper1 = Gossiper(node1, remote1) - Thread.sleep(5000) + Thread.sleep(10.seconds.dilated.toMillis) val members0 = gossiper0.latestGossip.members.toArray members0.size must be(2) @@ -91,7 +92,7 @@ class NodeMembershipSpec extends AkkaSpec(""" val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] gossiper2 = Gossiper(node2, remote2) - Thread.sleep(10000) + Thread.sleep(10.seconds.dilated.toMillis) val members0 = gossiper0.latestGossip.members.toArray val version = gossiper0.latestGossip.version diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index 6ccb8491c1..9066f6eaae 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -9,6 +9,7 @@ import akka.testkit._ import akka.dispatch._ import akka.actor._ import akka.remote._ +import akka.util.duration._ import com.typesafe.config._ @@ -40,7 +41,7 @@ class NodeStartupSpec extends AkkaSpec(""" gossiper0 = Gossiper(node0, remote0) "be a singleton cluster when started up" in { - Thread.sleep(1000) + Thread.sleep(1.seconds.dilated.toMillis) gossiper0.isSingletonCluster must be(true) } @@ -69,7 +70,7 @@ class NodeStartupSpec extends AkkaSpec(""" val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] gossiper1 = Gossiper(node1, remote1) - Thread.sleep(1000) // give enough time for node1 to JOIN node0 + Thread.sleep(1.seconds.dilated.toMillis) // give enough time for node1 to JOIN node0 val members = gossiper0.latestGossip.members val joiningMember = members find (_.address.port.get == 5551) joiningMember must be('defined) From cfd04bba3dd158ca26705ad0dfc297baf4fae262 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sat, 18 Feb 2012 17:48:07 +0100 Subject: [PATCH 51/72] Fixed remaining issues with gossip based failure detection and removal of unreachable nodes. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Completed gossip based failure detection. * Completed removal of unreachable nodes according to failure detector. * Added passing tests. * Misc other fixes, more logging, more comments. Signed-off-by: Jonas Bonér --- .../akka/cluster/AccrualFailureDetector.scala | 70 +++--- .../main/scala/akka/cluster/Gossiper.scala | 23 +- .../cluster/AccrualFailureDetectorSpec.scala | 10 +- .../GossipingAccrualFailureDetectorSpec.scala | 199 ++++++++++-------- 4 files changed, 173 insertions(+), 129 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala index 8ee9f857a0..e0d7cae052 100644 --- a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala +++ b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala @@ -23,14 +23,17 @@ import System.{ currentTimeMillis ⇒ newTimestamp } *

* Default threshold is 8, but can be configured in the Akka config. */ -class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val maxSampleSize: Int = 1000) { +class AccrualFailureDetector(system: ActorSystem, address: Address, val threshold: Int = 8, val maxSampleSize: Int = 1000) { private final val PhiFactor = 1.0 / math.log(10.0) - private case class FailureStats(mean: Double = 0.0D, variance: Double = 0.0D, deviation: Double = 0.0D) - private val log = Logging(system, "FailureDetector") + /** + * Holds the failure statistics for a specific node Address. + */ + private case class FailureStats(mean: Double = 0.0D, variance: Double = 0.0D, deviation: Double = 0.0D) + /** * Implement using optimistic lockless concurrency, all state is represented * by this immutable case class and managed by an AtomicReference. @@ -54,22 +57,26 @@ class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val ma */ @tailrec final def heartbeat(connection: Address) { - log.debug("Heartbeat from connection [{}] ", connection) - val oldState = state.get + log.debug("Node [{}] - Heartbeat from connection [{}] ", address, connection) + val oldState = state.get + val oldFailureStats = oldState.failureStats + val oldTimestamps = oldState.timestamps val latestTimestamp = oldState.timestamps.get(connection) + if (latestTimestamp.isEmpty) { // this is heartbeat from a new connection // add starter records for this new connection - val failureStats = oldState.failureStats + (connection -> FailureStats()) - val intervalHistory = oldState.intervalHistory + (connection -> Vector.empty[Long]) - val timestamps = oldState.timestamps + (connection -> newTimestamp) + val newFailureStats = oldFailureStats + (connection -> FailureStats()) + val newIntervalHistory = oldState.intervalHistory + (connection -> Vector.empty[Long]) + val newTimestamps = oldTimestamps + (connection -> newTimestamp) - val newState = oldState copy (version = oldState.version + 1, - failureStats = failureStats, - intervalHistory = intervalHistory, - timestamps = timestamps) + val newState = oldState copy ( + version = oldState.version + 1, + failureStats = newFailureStats, + intervalHistory = newIntervalHistory, + timestamps = newTimestamps) // if we won the race then update else try again if (!state.compareAndSet(oldState, newState)) heartbeat(connection) // recur @@ -79,7 +86,7 @@ class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val ma val timestamp = newTimestamp val interval = timestamp - latestTimestamp.get - val timestamps = oldState.timestamps + (connection -> timestamp) // record new timestamp + val newTimestamps = oldTimestamps + (connection -> timestamp) // record new timestamp var newIntervalsForConnection = oldState.intervalHistory.get(connection).getOrElse(Vector.empty[Long]) :+ interval // append the new interval to history @@ -89,36 +96,33 @@ class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val ma newIntervalsForConnection = newIntervalsForConnection drop 0 } - val failureStats = + val newFailureStats = if (newIntervalsForConnection.size > 1) { - val mean: Double = newIntervalsForConnection.sum / newIntervalsForConnection.size.toDouble - - val oldFailureStats = oldState.failureStats.get(connection).getOrElse(FailureStats()) + val newMean: Double = newIntervalsForConnection.sum / newIntervalsForConnection.size.toDouble + val oldConnectionFailureStats = oldFailureStats.get(connection).getOrElse(throw new IllegalStateException("Can't calculate new failure statistics due to missing heartbeat history")) val deviationSum = newIntervalsForConnection .map(_.toDouble) - .foldLeft(0.0D)((x, y) ⇒ x + (y - mean)) + .foldLeft(0.0D)((x, y) ⇒ x + (y - newMean)) - val variance: Double = deviationSum / newIntervalsForConnection.size.toDouble - val deviation: Double = math.sqrt(variance) + val newVariance: Double = deviationSum / newIntervalsForConnection.size.toDouble + val newDeviation: Double = math.sqrt(newVariance) - val newFailureStats = oldFailureStats copy (mean = mean, - deviation = deviation, - variance = variance) + val newFailureStats = oldConnectionFailureStats copy (mean = newMean, deviation = newDeviation, variance = newVariance) + oldFailureStats + (connection -> newFailureStats) - oldState.failureStats + (connection -> newFailureStats) } else { - oldState.failureStats + oldFailureStats } - val intervalHistory = oldState.intervalHistory + (connection -> newIntervalsForConnection) + val newIntervalHistory = oldState.intervalHistory + (connection -> newIntervalsForConnection) val newState = oldState copy (version = oldState.version + 1, - failureStats = failureStats, - intervalHistory = intervalHistory, - timestamps = timestamps) + failureStats = newFailureStats, + intervalHistory = newIntervalHistory, + timestamps = newTimestamps) // if we won the race then update else try again if (!state.compareAndSet(oldState, newState)) heartbeat(connection) // recur @@ -138,17 +142,21 @@ class AccrualFailureDetector(system: ActorSystem, val threshold: Int = 8, val ma def phi(connection: Address): Double = { val oldState = state.get val oldTimestamp = oldState.timestamps.get(connection) + val phi = if (oldTimestamp.isEmpty) 0.0D // treat unmanaged connections, e.g. with zero heartbeats, as healthy connections else { val timestampDiff = newTimestamp - oldTimestamp.get - val mean = oldState.failureStats.get(connection).getOrElse(FailureStats()).mean + + val stats = oldState.failureStats.get(connection) + val mean = stats.getOrElse(throw new IllegalStateException("Can't calculate Failure Detector Phi value for a node that have no heartbeat history")).mean + if (mean == 0.0D) 0.0D else PhiFactor * timestampDiff / mean } // only log if PHI value is starting to get interesting - if (phi > 0.0D) log.debug("Phi value [{}] and threshold [{}] for connection [{}] ", phi, threshold, connection) + if (phi > 0.0D) log.debug("Node [{}] - Phi value [{}] and threshold [{}] for connection [{}] ", address, phi, threshold, connection) phi } diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index bb1e19e746..73575efec7 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -210,11 +210,12 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) + val failureDetector = new AccrualFailureDetector( + system, remoteAddress, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) + private val nodeToJoin: Option[Address] = clusterSettings.NodeToJoin filter (_ != remoteAddress) private val serialization = remote.serialization - private val failureDetector = new AccrualFailureDetector( - system, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) private val isRunning = new AtomicBoolean(true) private val log = Logging(system, "Gossiper") @@ -279,12 +280,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (isRunning.compareAndSet(true, false)) { log.info("Node [{}] - Shutting down Gossiper and ClusterDaemon...", remoteAddress) - try connectionManager.shutdown() finally { - try system.stop(clusterDaemon) finally { - try gossipCanceller.cancel() finally { - try scrutinizeCanceller.cancel() finally { - log.info("Node [{}] - Gossiper and ClusterDaemon shut down successfully", remoteAddress) - } + try system.stop(clusterDaemon) finally { + try gossipCanceller.cancel() finally { + try scrutinizeCanceller.cancel() finally { + log.info("Node [{}] - Gossiper and ClusterDaemon shut down successfully", remoteAddress) } } } @@ -298,6 +297,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { final def joining(node: Address) { log.info("Node [{}] - Node [{}] is joining", remoteAddress, node) + failureDetector heartbeat node // update heartbeat in failure detector + val localState = state.get val localGossip = localState.latestGossip val localMembers = localGossip.members @@ -475,7 +476,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ private def gossipTo(address: Address) { setUpConnectionTo(address) foreach { connection ⇒ - log.debug("Node [{}] - Gossiping to [{}]", remoteAddress, address) + log.debug("Node [{}] - Gossiping to [{}]", remoteAddress, connection) connection ! GossipEnvelope(self, latestGossip) } } @@ -496,7 +497,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } /** - * Scrutinizes the cluster; marks members detected by the failure detector as unavailable. + * Scrutinizes the cluster; marks members detected by the failure detector as unreachable. */ @tailrec final private def scrutinize() { @@ -517,6 +518,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val newMembers = localMembers diff newlyDetectedUnreachableMembers val newUnreachableAddresses: Set[Address] = localUnreachableAddresses ++ newlyDetectedUnreachableAddresses + log.info("Node [{}] - Marking node(s) an unreachable [{}]", remoteAddress, newlyDetectedUnreachableAddresses.mkString(", ")) + val newOverview = localOverview copy (unreachable = newUnreachableAddresses) val newGossip = localGossip copy (overview = newOverview, members = newMembers) diff --git a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala index 034f582e0d..2e00c72ad1 100644 --- a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala @@ -13,13 +13,13 @@ class AccrualFailureDetectorSpec extends AkkaSpec(""" val conn2 = Address("akka", "", "localhost", 2553) "return phi value of 0.0D on startup for each address" in { - val fd = new AccrualFailureDetector(system) + val fd = new AccrualFailureDetector(system, conn) fd.phi(conn) must be(0.0D) fd.phi(conn2) must be(0.0D) } "mark node as available after a series of successful heartbeats" in { - val fd = new AccrualFailureDetector(system) + val fd = new AccrualFailureDetector(system, conn) fd.heartbeat(conn) @@ -34,7 +34,7 @@ class AccrualFailureDetectorSpec extends AkkaSpec(""" // FIXME how should we deal with explicit removal of connection? - if triggered as failure then we have a problem in boostrap - see line 142 in AccrualFailureDetector "mark node as dead after explicit removal of connection" ignore { - val fd = new AccrualFailureDetector(system) + val fd = new AccrualFailureDetector(system, conn) fd.heartbeat(conn) @@ -52,7 +52,7 @@ class AccrualFailureDetectorSpec extends AkkaSpec(""" } "mark node as dead if heartbeat are missed" in { - val fd = new AccrualFailureDetector(system, threshold = 3) + val fd = new AccrualFailureDetector(system, conn, threshold = 3) fd.heartbeat(conn) @@ -70,7 +70,7 @@ class AccrualFailureDetectorSpec extends AkkaSpec(""" } "mark node as available if it starts heartbeat again after being marked dead due to detection of failure" in { - val fd = new AccrualFailureDetector(system, threshold = 3) + val fd = new AccrualFailureDetector(system, conn, threshold = 3) fd.heartbeat(conn) diff --git a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala index 6366a9f65e..413ab7e537 100644 --- a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala @@ -1,95 +1,128 @@ -// /** -// * Copyright (C) 2009-2011 Typesafe Inc. -// */ -// package akka.cluster +/** + * Copyright (C) 2009-2012 Typesafe Inc. + */ +package akka.cluster -// import java.net.InetSocketAddress +import akka.testkit._ +import akka.dispatch._ +import akka.actor._ +import akka.remote._ +import akka.util.duration._ -// import akka.testkit._ -// import akka.dispatch._ -// import akka.actor._ -// import com.typesafe.config._ +import com.typesafe.config._ -// class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" -// akka { -// loglevel = "INFO" -// actor.provider = "akka.remote.RemoteActorRefProvider" +import java.net.InetSocketAddress -// remote.server.hostname = localhost -// remote.server.port = 5550 -// remote.failure-detector.threshold = 3 -// cluster.seed-nodes = ["akka://localhost:5551"] -// } -// """) with ImplicitSender { +class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" + akka { + loglevel = "INFO" + cluster.failure-detector.threshold = 3 + actor.debug.lifecycle = on + actor.debug.autoreceive = on + } + """) with ImplicitSender { -// val conn1 = Address("akka", system.systemName, Some("localhost"), Some(5551)) -// val node1 = ActorSystem("GossiperSpec", ConfigFactory -// .parseString("akka { remote.server.port=5551, cluster.use-cluster = on }") -// .withFallback(system.settings.config)) -// val remote1 = -// node1.asInstanceOf[ActorSystemImpl] -// .provider.asInstanceOf[RemoteActorRefProvider] -// .remote -// val gossiper1 = remote1.gossiper -// val fd1 = remote1.failureDetector -// gossiper1 must be('defined) + var gossiper1: Gossiper = _ + var gossiper2: Gossiper = _ + var gossiper3: Gossiper = _ -// val conn2 = RemoteNettyAddress("localhost", 5552) -// val node2 = ActorSystem("GossiperSpec", ConfigFactory -// .parseString("akka { remote.server.port=5552, cluster.use-cluster = on }") -// .withFallback(system.settings.config)) -// val remote2 = -// node2.asInstanceOf[ActorSystemImpl] -// .provider.asInstanceOf[RemoteActorRefProvider] -// .remote -// val gossiper2 = remote2.gossiper -// val fd2 = remote2.failureDetector -// gossiper2 must be('defined) + var node1: ActorSystemImpl = _ + var node2: ActorSystemImpl = _ + var node3: ActorSystemImpl = _ -// val conn3 = RemoteNettyAddress("localhost", 5553) -// val node3 = ActorSystem("GossiperSpec", ConfigFactory -// .parseString("akka { remote.server.port=5553, cluster.use-cluster = on }") -// .withFallback(system.settings.config)) -// val remote3 = -// node3.asInstanceOf[ActorSystemImpl] -// .provider.asInstanceOf[RemoteActorRefProvider] -// .remote -// val gossiper3 = remote3.gossiper -// val fd3 = remote3.failureDetector -// gossiper3 must be('defined) + try { + "A Gossip-driven Failure Detector" must { -// "A Gossip-driven Failure Detector" must { + // ======= NODE 1 ======== + node1 = ActorSystem("node1", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] + gossiper1 = Gossiper(node1, remote1) + val fd1 = gossiper1.failureDetector + val address1 = gossiper1.self.address -// "receive gossip heartbeats so that all healthy nodes in the cluster are marked 'available'" ignore { -// Thread.sleep(5000) // let them gossip for 10 seconds -// fd1.isAvailable(conn2) must be(true) -// fd1.isAvailable(conn3) must be(true) -// fd2.isAvailable(conn1) must be(true) -// fd2.isAvailable(conn3) must be(true) -// fd3.isAvailable(conn1) must be(true) -// fd3.isAvailable(conn2) must be(true) -// } + // ======= NODE 2 ======== + node2 = ActorSystem("node2", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port = 5551 + } + cluster.node-to-join = "akka://node1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] + gossiper2 = Gossiper(node2, remote2) + val fd2 = gossiper2.failureDetector + val address2 = gossiper2.self.address -// "mark node as 'unavailable' if a node in the cluster is shut down and its heartbeats stops" ignore { -// // kill node 3 -// gossiper3.get.shutdown() -// node3.shutdown() -// Thread.sleep(5000) // let them gossip for 10 seconds + // ======= NODE 3 ======== + node3 = ActorSystem("node3", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5552 + } + cluster.node-to-join = "akka://node1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote3 = node3.provider.asInstanceOf[RemoteActorRefProvider] + gossiper3 = Gossiper(node3, remote3) + val fd3 = gossiper3.failureDetector + val address3 = gossiper3.self.address -// fd1.isAvailable(conn2) must be(true) -// fd1.isAvailable(conn3) must be(false) -// fd2.isAvailable(conn1) must be(true) -// fd2.isAvailable(conn3) must be(false) -// } -// } + "receive gossip heartbeats so that all healthy nodes in the cluster are marked 'available'" in { + println("Let the nodes gossip for a while...") + Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds + fd1.isAvailable(address2) must be(true) + fd1.isAvailable(address3) must be(true) + fd2.isAvailable(address1) must be(true) + fd2.isAvailable(address3) must be(true) + fd3.isAvailable(address1) must be(true) + fd3.isAvailable(address2) must be(true) + } -// override def atTermination() { -// gossiper1.get.shutdown() -// gossiper2.get.shutdown() -// gossiper3.get.shutdown() -// node1.shutdown() -// node2.shutdown() -// node3.shutdown() -// // FIXME Ordering problem - If we shut down the ActorSystem before the Gossiper then we get an IllegalStateException -// } -// } + "mark node as 'unavailable' if a node in the cluster is shut down (and its heartbeats stops)" in { + // shut down node3 + gossiper3.shutdown() + node3.shutdown() + println("Give the remaning nodes time to detect failure...") + Thread.sleep(30.seconds.dilated.toMillis) // give them 30 seconds to detect failure of node3 + fd1.isAvailable(address2) must be(true) + fd1.isAvailable(address3) must be(false) + fd2.isAvailable(address1) must be(true) + fd2.isAvailable(address3) must be(false) + } + } + } catch { + case e: Exception ⇒ + e.printStackTrace + fail(e.toString) + } + + override def atTermination() { + gossiper1.shutdown() + node1.shutdown() + + gossiper2.shutdown() + node2.shutdown() + + gossiper3.shutdown() + node3.shutdown() + } +} From 0e6d272a8d2989617a25ba65752916d2b66fb2ec Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sat, 18 Feb 2012 22:14:53 +0100 Subject: [PATCH 52/72] Added support for checking for Cluster Convergence and completed support for MembershipChangeListener (including tests). MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../main/scala/akka/cluster/Gossiper.scala | 49 ++++-- .../MembershipChangeListenerSpec.scala | 144 ++++++++++++++++++ .../akka/cluster/NodeMembershipSpec.scala | 24 ++- 3 files changed, 200 insertions(+), 17 deletions(-) create mode 100644 akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 73575efec7..313f052e2a 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -26,10 +26,8 @@ import com.google.protobuf.ByteString /** * Interface for membership change listener. */ -trait MembershipChangeListener { // FIXME add notification of MembershipChangeListener +trait MembershipChangeListener { def notify(members: SortedSet[Member]): Unit - // def memberConnected(member: Member): Unit - // def memberDisconnected(member: Member): Unit } /** @@ -312,6 +310,11 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val newState = localState copy (latestGossip = seenVersionedGossip) if (!state.compareAndSet(localState, newState)) joining(node) // recur if we failed update + else { + if (convergence(newState.latestGossip).isDefined) { + newState.memberMembershipChangeListeners map { _ notify newMembers } // FIXME should check for cluster convergence before triggering listeners + } + } } /** @@ -323,8 +326,6 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { failureDetector heartbeat sender.address // update heartbeat in failure detector - // FIXME check for convergence - if we have convergence then trigger the listeners - val localState = state.get val localGossip = localState.latestGossip @@ -334,7 +335,8 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val mergedGossip = merge(remoteGossip, localGossip) val versionedMergedGossip = mergedGossip + selfNode - log.debug("Can't establish a causal relationship between \"remote\" gossip [{}] and \"local\" gossip [{}] - merging them into [{}]", + log.debug( + "Can't establish a causal relationship between \"remote\" gossip [{}] and \"local\" gossip [{}] - merging them into [{}]", remoteGossip, localGossip, versionedMergedGossip) versionedMergedGossip @@ -352,6 +354,11 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // if we won the race then update else try again if (!state.compareAndSet(localState, newState)) receive(sender, remoteGossip) // recur if we fail the update + else { + if (convergence(newState.latestGossip).isDefined) { + newState.memberMembershipChangeListeners map { _ notify newState.latestGossip.members } // FIXME should check for cluster convergence before triggering listeners + } + } } /** @@ -376,6 +383,13 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (!state.compareAndSet(localState, newState)) unregisterListener(listener) // recur } + /** + * Checks if we have a cluster convergence. + * + * @returns Some(convergedGossip) if convergence have been reached and None if not + */ + def convergence: Option[Gossip] = convergence(latestGossip) + // ======================================================== // ===================== INTERNAL API ===================== // ======================================================== @@ -531,17 +545,28 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // if we won the race then update else try again if (!state.compareAndSet(localState, newState)) scrutinize() // recur else { - // FIXME should only notify when there is a cluster convergence - // notify listeners on successful update of state - // for { - // deadNode ← newUnreachableAddresses - // listener ← localState.memberMembershipChangeListeners - // } listener memberDisconnected deadNode + if (convergence(newState.latestGossip).isDefined) { + newState.memberMembershipChangeListeners map { _ notify newMembers } // FIXME should check for cluster convergence before triggering listeners + } } } } } + /** + * Checks if we have a cluster convergence. + * + * @returns Some(convergedGossip) if convergence have been reached and None if not + */ + private def convergence(gossip: Gossip): Option[Gossip] = { + val seen = gossip.overview.seen + val views = Set.empty[VectorClock] ++ seen.values + if (views.size == 1) { + log.debug("Node [{}] - Cluster convergence reached", remoteAddress) + Some(gossip) + } else None + } + // FIXME should shuffle list randomly before start traversing to avoid connecting to some member on every member @tailrec final private def connectToRandomNodeOf(addresses: Seq[Address]): ActorRef = { diff --git a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala new file mode 100644 index 0000000000..74de663697 --- /dev/null +++ b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala @@ -0,0 +1,144 @@ +/** + * Copyright (C) 2009-2012 Typesafe Inc. + */ +package akka.cluster + +import akka.testkit._ +import akka.dispatch._ +import akka.actor._ +import akka.remote._ +import akka.util.duration._ + +import java.net.InetSocketAddress +import java.util.concurrent.{ CountDownLatch, TimeUnit } + +import scala.collection.immutable.SortedSet + +import com.typesafe.config._ + +class MembershipChangeListenerSpec extends AkkaSpec(""" + akka { + loglevel = "INFO" + } + """) with ImplicitSender { + + var gossiper0: Gossiper = _ + var gossiper1: Gossiper = _ + var gossiper2: Gossiper = _ + + var node0: ActorSystemImpl = _ + var node1: ActorSystemImpl = _ + var node2: ActorSystemImpl = _ + + try { + "A set of connected cluster nodes" must { + "(when two nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" in { + node0 = ActorSystem("node0", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] + gossiper0 = Gossiper(node0, remote0) + + node1 = ActorSystem("node1", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5551 + } + cluster.node-to-join = "akka://node0@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] + gossiper1 = Gossiper(node1, remote1) + + val latch = new CountDownLatch(2) + + gossiper0.registerListener(new MembershipChangeListener { + def notify(members: SortedSet[Member]) { + latch.countDown() + } + }) + gossiper1.registerListener(new MembershipChangeListener { + def notify(members: SortedSet[Member]) { + latch.countDown() + } + }) + + latch.await(10.seconds.dilated.toMillis, TimeUnit.MILLISECONDS) + + // check cluster convergence + gossiper0.convergence must be('defined) + gossiper1.convergence must be('defined) + } + + "(when three nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" in { + + // ======= NODE 2 ======== + node2 = ActorSystem("node2", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5552 + } + cluster.node-to-join = "akka://node0@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] + gossiper2 = Gossiper(node2, remote2) + + val latch = new CountDownLatch(3) + gossiper0.registerListener(new MembershipChangeListener { + def notify(members: SortedSet[Member]) { + latch.countDown() + } + }) + gossiper1.registerListener(new MembershipChangeListener { + def notify(members: SortedSet[Member]) { + latch.countDown() + } + }) + gossiper2.registerListener(new MembershipChangeListener { + def notify(members: SortedSet[Member]) { + latch.countDown() + } + }) + + latch.await(10.seconds.dilated.toMillis, TimeUnit.MILLISECONDS) + + // check cluster convergence + gossiper0.convergence must be('defined) + gossiper1.convergence must be('defined) + gossiper2.convergence must be('defined) + } + } + } catch { + case e: Exception ⇒ + e.printStackTrace + fail(e.toString) + } + + override def atTermination() { + gossiper0.shutdown() + node0.shutdown() + + gossiper1.shutdown() + node1.shutdown() + + gossiper2.shutdown() + node2.shutdown() + } +} diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index a2106fc6da..5fc062f517 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -30,7 +30,9 @@ class NodeMembershipSpec extends AkkaSpec(""" try { "A set of connected cluster nodes" must { "(when two nodes) start gossiping to each other so that both nodes gets the same gossip info" in { - node0 = ActorSystem("NodeMembershipSpec", ConfigFactory + + // ======= NODE 0 ======== + node0 = ActorSystem("node0", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -44,7 +46,8 @@ class NodeMembershipSpec extends AkkaSpec(""" val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] gossiper0 = Gossiper(node0, remote0) - node1 = ActorSystem("NodeMembershipSpec", ConfigFactory + // ======= NODE 1 ======== + node1 = ActorSystem("node1", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -52,7 +55,7 @@ class NodeMembershipSpec extends AkkaSpec(""" hostname = localhost port=5551 } - cluster.node-to-join = "akka://NodeMembershipSpec@localhost:5550" + cluster.node-to-join = "akka://node0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] @@ -61,6 +64,10 @@ class NodeMembershipSpec extends AkkaSpec(""" Thread.sleep(10.seconds.dilated.toMillis) + // check cluster convergence + gossiper0.convergence must be('defined) + gossiper1.convergence must be('defined) + val members0 = gossiper0.latestGossip.members.toArray members0.size must be(2) members0(0).address.port.get must be(5550) @@ -77,7 +84,9 @@ class NodeMembershipSpec extends AkkaSpec(""" } "(when three nodes) start gossiping to each other so that both nodes gets the same gossip info" in { - node2 = ActorSystem("NodeMembershipSpec", ConfigFactory + + // ======= NODE 2 ======== + node2 = ActorSystem("node2", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -85,7 +94,7 @@ class NodeMembershipSpec extends AkkaSpec(""" hostname = localhost port=5552 } - cluster.node-to-join = "akka://NodeMembershipSpec@localhost:5550" + cluster.node-to-join = "akka://node0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] @@ -94,6 +103,11 @@ class NodeMembershipSpec extends AkkaSpec(""" Thread.sleep(10.seconds.dilated.toMillis) + // check cluster convergence + gossiper0.convergence must be('defined) + gossiper1.convergence must be('defined) + gossiper2.convergence must be('defined) + val members0 = gossiper0.latestGossip.members.toArray val version = gossiper0.latestGossip.version members0.size must be(3) From 0d022afa5e8d0da80685a91f8198ab6e8f393cf5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sun, 19 Feb 2012 21:18:16 +0100 Subject: [PATCH 53/72] Created test tag LongRunningTest ("long-running") for excluding long running (cluster) tests from standard suite. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../GossipingAccrualFailureDetectorSpec.scala | 16 ++++++++-------- .../cluster/MembershipChangeListenerSpec.scala | 16 ++++++++-------- .../scala/akka/cluster/NodeMembershipSpec.scala | 16 ++++++++-------- .../scala/akka/cluster/NodeStartupSpec.scala | 8 ++++---- .../src/test/scala/akka/testkit/AkkaSpec.scala | 1 + project/AkkaBuild.scala | 2 +- 6 files changed, 30 insertions(+), 29 deletions(-) diff --git a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala index 413ab7e537..8939c4d728 100644 --- a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala @@ -86,7 +86,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" val fd3 = gossiper3.failureDetector val address3 = gossiper3.self.address - "receive gossip heartbeats so that all healthy nodes in the cluster are marked 'available'" in { + "receive gossip heartbeats so that all healthy nodes in the cluster are marked 'available'" taggedAs LongRunningTest in { println("Let the nodes gossip for a while...") Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds fd1.isAvailable(address2) must be(true) @@ -97,7 +97,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" fd3.isAvailable(address2) must be(true) } - "mark node as 'unavailable' if a node in the cluster is shut down (and its heartbeats stops)" in { + "mark node as 'unavailable' if a node in the cluster is shut down (and its heartbeats stops)" taggedAs LongRunningTest in { // shut down node3 gossiper3.shutdown() node3.shutdown() @@ -116,13 +116,13 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" } override def atTermination() { - gossiper1.shutdown() - node1.shutdown() + if (gossiper1 ne null) gossiper1.shutdown() + if (node1 ne null) node1.shutdown() - gossiper2.shutdown() - node2.shutdown() + if (gossiper2 ne null) gossiper2.shutdown() + if (node2 ne null) node2.shutdown() - gossiper3.shutdown() - node3.shutdown() + if (gossiper3 ne null) gossiper3.shutdown() + if (node3 ne null) node3.shutdown() } } diff --git a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala index 74de663697..e168b7caee 100644 --- a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala @@ -32,7 +32,7 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" try { "A set of connected cluster nodes" must { - "(when two nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" in { + "(when two nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { node0 = ActorSystem("node0", ConfigFactory .parseString(""" akka { @@ -82,7 +82,7 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" gossiper1.convergence must be('defined) } - "(when three nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" in { + "(when three nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { // ======= NODE 2 ======== node2 = ActorSystem("node2", ConfigFactory @@ -132,13 +132,13 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" } override def atTermination() { - gossiper0.shutdown() - node0.shutdown() + if (gossiper0 ne null) gossiper0.shutdown() + if (node0 ne null) node0.shutdown() - gossiper1.shutdown() - node1.shutdown() + if (gossiper1 ne null) gossiper1.shutdown() + if (node1 ne null) node1.shutdown() - gossiper2.shutdown() - node2.shutdown() + if (gossiper2 ne null) gossiper2.shutdown() + if (node2 ne null) node2.shutdown() } } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index 5fc062f517..2ce0a1d449 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -29,7 +29,7 @@ class NodeMembershipSpec extends AkkaSpec(""" try { "A set of connected cluster nodes" must { - "(when two nodes) start gossiping to each other so that both nodes gets the same gossip info" in { + "(when two nodes) start gossiping to each other so that both nodes gets the same gossip info" taggedAs LongRunningTest in { // ======= NODE 0 ======== node0 = ActorSystem("node0", ConfigFactory @@ -83,7 +83,7 @@ class NodeMembershipSpec extends AkkaSpec(""" members1(1).status must be(MemberStatus.Joining) } - "(when three nodes) start gossiping to each other so that both nodes gets the same gossip info" in { + "(when three nodes) start gossiping to each other so that both nodes gets the same gossip info" taggedAs LongRunningTest in { // ======= NODE 2 ======== node2 = ActorSystem("node2", ConfigFactory @@ -144,13 +144,13 @@ class NodeMembershipSpec extends AkkaSpec(""" } override def atTermination() { - gossiper0.shutdown() - node0.shutdown() + if (gossiper0 ne null) gossiper0.shutdown() + if (node0 ne null) node0.shutdown() - gossiper1.shutdown() - node1.shutdown() + if (gossiper1 ne null) gossiper1.shutdown() + if (node1 ne null) node1.shutdown() - gossiper2.shutdown() - node2.shutdown() + if (gossiper2 ne null) gossiper2.shutdown() + if (node2 ne null) node2.shutdown() } } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index 9066f6eaae..b3805b7946 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -84,10 +84,10 @@ class NodeStartupSpec extends AkkaSpec(""" } override def atTermination() { - gossiper0.shutdown() - node0.shutdown() + if (gossiper0 ne null) gossiper0.shutdown() + if (node0 ne null) node0.shutdown() - gossiper1.shutdown() - node1.shutdown() + if (gossiper1 ne null) gossiper1.shutdown() + if (node1 ne null) node1.shutdown() } } diff --git a/akka-testkit/src/test/scala/akka/testkit/AkkaSpec.scala b/akka-testkit/src/test/scala/akka/testkit/AkkaSpec.scala index 95ce267320..3a0f02c79a 100644 --- a/akka-testkit/src/test/scala/akka/testkit/AkkaSpec.scala +++ b/akka-testkit/src/test/scala/akka/testkit/AkkaSpec.scala @@ -20,6 +20,7 @@ import akka.dispatch.Dispatchers import akka.pattern.ask object TimingTest extends Tag("timing") +object LongRunningTest extends Tag("long-running") object AkkaSpec { val testConf: Config = ConfigFactory.parseString(""" diff --git a/project/AkkaBuild.scala b/project/AkkaBuild.scala index 43a34e640c..a883eb7093 100644 --- a/project/AkkaBuild.scala +++ b/project/AkkaBuild.scala @@ -341,7 +341,7 @@ object AkkaBuild extends Build { val excludeTestTags = SettingKey[Seq[String]]("exclude-test-tags") val includeTestTags = SettingKey[Seq[String]]("include-test-tags") - val defaultExcludedTags = Seq("timing") + val defaultExcludedTags = Seq("timing", "long-running") lazy val defaultSettings = baseSettings ++ formatSettings ++ Seq( resolvers += "Typesafe Repo" at "http://repo.typesafe.com/typesafe/releases/", From 2c67a6d50d4246f5852c15ca6e0eec67dfd4f9f7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Mon, 20 Feb 2012 15:26:12 +0100 Subject: [PATCH 54/72] Split up ClusterDaemon into ClusterGossipDaemon (routed with configurable N instances) and ClusterCommandDaemon (shortly to be an FSM). Removed ConnectionManager crap. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/resources/reference.conf | 3 + .../scala/akka/cluster/ClusterSettings.scala | 1 + .../main/scala/akka/cluster/Gossiper.scala | 103 +++++++++--------- .../akka/cluster/ClusterConfigSpec.scala | 1 + .../MembershipChangeListenerSpec.scala | 4 + 5 files changed, 59 insertions(+), 53 deletions(-) diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index 3df6dd3774..feada91c01 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -12,6 +12,9 @@ akka { # leave as empty string if the node should be a singleton cluster node-to-join = "" + # the number of gossip daemon actors + nr-of-gossip-daemons = 4 + gossip { initialDelay = 5s frequency = 1s diff --git a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala index f4d57bf1f6..9872f3e233 100644 --- a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala +++ b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala @@ -21,4 +21,5 @@ class ClusterSettings(val config: Config, val systemName: String) { } val GossipInitialDelay = Duration(getMilliseconds("akka.cluster.gossip.initialDelay"), MILLISECONDS) val GossipFrequency = Duration(getMilliseconds("akka.cluster.gossip.frequency"), MILLISECONDS) + val NrOfGossipDaemons = getInt("akka.cluster.nr-of-gossip-daemons") } diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala index 313f052e2a..b3e7df27bf 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala @@ -7,6 +7,7 @@ package akka.cluster import akka.actor._ import akka.actor.Status._ import akka.remote._ +import akka.routing._ import akka.event.Logging import akka.dispatch.Await import akka.pattern.ask @@ -119,7 +120,7 @@ case class GossipOverview( case class Gossip( overview: GossipOverview = GossipOverview(), members: SortedSet[Member], // sorted set of members with their status, sorted by name - //partitions: Tree[PartitionPath, Node] = Tree.empty[PartitionPath, Node], + //partitions: Tree[PartitionPath, Node] = Tree.empty[PartitionPath, Node], // name/partition service //pending: Set[PartitioningChange] = Set.empty[PartitioningChange], meta: Map[String, Array[Byte]] = Map.empty[String, Array[Byte]], version: VectorClock = VectorClock()) // vector clock version @@ -152,16 +153,32 @@ case class Gossip( ")" } -// FIXME add FSM trait? -final class ClusterDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { - val log = Logging(system, "ClusterDaemon") +// FIXME ClusterCommandDaemon with FSM trait +/** + * Single instance. FSM managing the different cluster nodes states. + * Serialized access to Gossiper. + */ +final class ClusterCommandDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { + val log = Logging(system, "ClusterCommandDaemon") + + def receive = { + case Join(address) ⇒ gossiper.joining(address) + case Leave(address) ⇒ //gossiper.leaving(address) + case Down(address) ⇒ //gossiper.downing(address) + case Remove(address) ⇒ //gossiper.removing(address) + case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") + } +} + +/** + * Pooled and routed wit N number of configurable instances. + * Concurrent access to Gossiper. + */ +final class ClusterGossipDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { + val log = Logging(system, "ClusterGossipDaemon") def receive = { case GossipEnvelope(sender, gossip) ⇒ gossiper.receive(sender, gossip) - case Join(address) ⇒ gossiper.joining(address) - case Leave(address) ⇒ //gossiper.leaving(address) - case Down(address) ⇒ //gossiper.downing(address) - case Remove(address) ⇒ //gossiper.removing(address) case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } @@ -211,6 +228,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val failureDetector = new AccrualFailureDetector( system, remoteAddress, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) + private val nrOfGossipDaemons = clusterSettings.NrOfGossipDaemons private val nodeToJoin: Option[Address] = clusterSettings.NodeToJoin filter (_ != remoteAddress) private val serialization = remote.serialization @@ -221,7 +239,11 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // Is it right to put this guy under the /system path or should we have a top-level /cluster or something else...? // FIXME should be defined as a router so we get concurrency here - private val clusterDaemon = system.systemActorOf(Props(new ClusterDaemon(system, this)), "cluster") + private val clusterCommandDaemon = system.systemActorOf( + Props(new ClusterCommandDaemon(system, this)), "clusterCommand") + + private val clusterGossipDaemon = system.systemActorOf( + Props(new ClusterGossipDaemon(system, this)).withRouter(RoundRobinRouter(nrOfGossipDaemons)), "clusterGossip") private val state = { val member = Member(remoteAddress, MemberStatus.Joining) @@ -229,9 +251,6 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { new AtomicReference[State](State(member, gossip)) } - // FIXME manage connections in some other way so we can delete the RemoteConnectionManager (SINCE IT SUCKS!!!) - private val connectionManager = new RemoteConnectionManager(system, remote, failureDetector, Map.empty[Address, ActorRef]) - import Versioned.latestVersionOf log.info("Node [{}] - Starting cluster Gossiper...", remoteAddress) @@ -278,10 +297,12 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (isRunning.compareAndSet(true, false)) { log.info("Node [{}] - Shutting down Gossiper and ClusterDaemon...", remoteAddress) - try system.stop(clusterDaemon) finally { - try gossipCanceller.cancel() finally { - try scrutinizeCanceller.cancel() finally { - log.info("Node [{}] - Gossiper and ClusterDaemon shut down successfully", remoteAddress) + try system.stop(clusterCommandDaemon) finally { + try system.stop(clusterGossipDaemon) finally { + try gossipCanceller.cancel() finally { + try scrutinizeCanceller.cancel() finally { + log.info("Node [{}] - Gossiper and ClusterDaemon shut down successfully", remoteAddress) + } } } } @@ -398,11 +419,10 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * Joins the pre-configured contact point and retrieves current gossip state. */ private def join() = nodeToJoin foreach { address ⇒ - setUpConnectionTo(address) foreach { connection ⇒ - val command = Join(remoteAddress) - log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, address, connection) - connection ! command - } + val connection = clusterCommandConnectionFor(address) + val command = Join(remoteAddress) + log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, address, connection) + connection ! command } /** @@ -489,10 +509,9 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * Gossips latest gossip to an address. */ private def gossipTo(address: Address) { - setUpConnectionTo(address) foreach { connection ⇒ - log.debug("Node [{}] - Gossiping to [{}]", remoteAddress, connection) - connection ! GossipEnvelope(self, latestGossip) - } + val connection = clusterGossipConnectionFor(address) + log.debug("Node [{}] - Gossiping to [{}]", remoteAddress, connection) + connection ! GossipEnvelope(self, latestGossip) } /** @@ -567,37 +586,15 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } else None } - // FIXME should shuffle list randomly before start traversing to avoid connecting to some member on every member - @tailrec - final private def connectToRandomNodeOf(addresses: Seq[Address]): ActorRef = { - addresses match { - - case address :: rest ⇒ - setUpConnectionTo(address) match { - case Some(connection) ⇒ connection - case None ⇒ connectToRandomNodeOf(rest) // recur - if we could not set up a connection - try next address - } - - case Nil ⇒ - throw new RemoteConnectionException( - "Could not establish connection to any of the addresses in the argument list [" + addresses.mkString(", ") + "]") - } - } + /** + * Sets up cluster command connection. + */ + private def clusterCommandConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "clusterCommand") /** - * Sets up remote connections to all the addresses in the argument list. + * Sets up cluster gossip connection. */ - private def setUpConnectionsTo(addresses: Seq[Address]): Seq[Option[ActorRef]] = addresses map setUpConnectionTo - - /** - * Sets up remote connection. - */ - private def setUpConnectionTo(address: Address): Option[ActorRef] = Option { - // FIXME no need for using a factory here - remove connectionManager - try connectionManager.putIfAbsent(address, () ⇒ system.actorFor(RootActorPath(address) / "system" / "cluster")) catch { - case e: Exception ⇒ null - } - } + private def clusterGossipConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "clusterGossip") private def deputyNodesWithoutMyself: Seq[Address] = Seq.empty[Address] filter (_ != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 78c836f0b5..2afbc7efc0 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -28,6 +28,7 @@ class ClusterConfigSpec extends AkkaSpec( NodeToJoin must be(None) GossipInitialDelay must be(5 seconds) GossipFrequency must be(1 second) + NrOfGossipDaemons must be(4) } } } diff --git a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala index e168b7caee..a82bbe4d5e 100644 --- a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala @@ -77,6 +77,8 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" latch.await(10.seconds.dilated.toMillis, TimeUnit.MILLISECONDS) + Thread.sleep(10.seconds.dilated.toMillis) + // check cluster convergence gossiper0.convergence must be('defined) gossiper1.convergence must be('defined) @@ -119,6 +121,8 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" latch.await(10.seconds.dilated.toMillis, TimeUnit.MILLISECONDS) + Thread.sleep(10.seconds.dilated.toMillis) + // check cluster convergence gossiper0.convergence must be('defined) gossiper1.convergence must be('defined) From 3c2f5ab93c8ba4d1376860803536656a603679d5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Mon, 20 Feb 2012 15:45:50 +0100 Subject: [PATCH 55/72] Renamed Gossiper to Node (and selfNode to vclockNode). MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../cluster/{Gossiper.scala => Node.scala} | 44 ++++++------ .../GossipingAccrualFailureDetectorSpec.scala | 66 +++++++++--------- .../MembershipChangeListenerSpec.scala | 66 +++++++++--------- .../akka/cluster/NodeMembershipSpec.scala | 68 +++++++++---------- .../scala/akka/cluster/NodeStartupSpec.scala | 30 ++++---- 5 files changed, 137 insertions(+), 137 deletions(-) rename akka-cluster/src/main/scala/akka/cluster/{Gossiper.scala => Node.scala} (93%) diff --git a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala similarity index 93% rename from akka-cluster/src/main/scala/akka/cluster/Gossiper.scala rename to akka-cluster/src/main/scala/akka/cluster/Node.scala index b3e7df27bf..0eaa6b1d16 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Gossiper.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -156,35 +156,35 @@ case class Gossip( // FIXME ClusterCommandDaemon with FSM trait /** * Single instance. FSM managing the different cluster nodes states. - * Serialized access to Gossiper. + * Serialized access to Node. */ -final class ClusterCommandDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { +final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor { val log = Logging(system, "ClusterCommandDaemon") def receive = { - case Join(address) ⇒ gossiper.joining(address) - case Leave(address) ⇒ //gossiper.leaving(address) - case Down(address) ⇒ //gossiper.downing(address) - case Remove(address) ⇒ //gossiper.removing(address) + case Join(address) ⇒ node.joining(address) + case Leave(address) ⇒ //node.leaving(address) + case Down(address) ⇒ //node.downing(address) + case Remove(address) ⇒ //node.removing(address) case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } /** * Pooled and routed wit N number of configurable instances. - * Concurrent access to Gossiper. + * Concurrent access to Node. */ -final class ClusterGossipDaemon(system: ActorSystem, gossiper: Gossiper) extends Actor { +final class ClusterGossipDaemon(system: ActorSystem, node: Node) extends Actor { val log = Logging(system, "ClusterGossipDaemon") def receive = { - case GossipEnvelope(sender, gossip) ⇒ gossiper.receive(sender, gossip) + case GossipEnvelope(sender, gossip) ⇒ node.receive(sender, gossip) case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } } // FIXME Cluster public API should be an Extension -// FIXME Add cluster Node class and refactor out all non-gossip related stuff out of Gossiper +// FIXME Add cluster Node class and refactor out all non-gossip related stuff out of Node /** * This module is responsible for Gossiping cluster information. The abstraction maintains the list of live @@ -201,10 +201,10 @@ final class ClusterGossipDaemon(system: ActorSystem, gossiper: Gossiper) extends * gossip to random deputy with certain probability depending on number of unreachable, deputy and live members. * */ -case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { +case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { /** - * Represents the state for this Gossiper. Implemented using optimistic lockless concurrency, + * Represents the state for this Node. Implemented using optimistic lockless concurrency, * all state is represented by this immutable case class and managed by an AtomicReference. */ private case class State( @@ -216,7 +216,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val clusterSettings = new ClusterSettings(system.settings.config, system.name) val remoteAddress = remote.transport.address - val selfNode = VectorClock.Node(remoteAddress.toString) + val vclockNode = VectorClock.Node(remoteAddress.toString) val gossipInitialDelay = clusterSettings.GossipInitialDelay val gossipFrequency = clusterSettings.GossipFrequency @@ -234,7 +234,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val serialization = remote.serialization private val isRunning = new AtomicBoolean(true) - private val log = Logging(system, "Gossiper") + private val log = Logging(system, "Node") private val random = SecureRandom.getInstance("SHA1PRNG") // Is it right to put this guy under the /system path or should we have a top-level /cluster or something else...? @@ -247,13 +247,13 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val state = { val member = Member(remoteAddress, MemberStatus.Joining) - val gossip = Gossip(members = SortedSet.empty[Member] + member) + selfNode // add me as member and update my vector clock + val gossip = Gossip(members = SortedSet.empty[Member] + member) + vclockNode // add me as member and update my vector clock new AtomicReference[State](State(member, gossip)) } import Versioned.latestVersionOf - log.info("Node [{}] - Starting cluster Gossiper...", remoteAddress) + log.info("Node [{}] - Starting cluster Node...", remoteAddress) // try to join the node defined in the 'akka.cluster.node-to-join' option join() @@ -295,13 +295,13 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { // FIXME Cheating for now. Can't just shut down. Node must first gossip an Leave command, wait for Leader to do proper Handoff and then await an Exit command before switching to Removed if (isRunning.compareAndSet(true, false)) { - log.info("Node [{}] - Shutting down Gossiper and ClusterDaemon...", remoteAddress) + log.info("Node [{}] - Shutting down Node and ClusterDaemon...", remoteAddress) try system.stop(clusterCommandDaemon) finally { try system.stop(clusterGossipDaemon) finally { try gossipCanceller.cancel() finally { try scrutinizeCanceller.cancel() finally { - log.info("Node [{}] - Gossiper and ClusterDaemon shut down successfully", remoteAddress) + log.info("Node [{}] - Node and ClusterDaemon shut down successfully", remoteAddress) } } } @@ -325,7 +325,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val newMembers = localMembers + Member(node, MemberStatus.Joining) // add joining node as Joining val newGossip = localGossip copy (members = newMembers) - val versionedGossip = newGossip + selfNode + val versionedGossip = newGossip + vclockNode val seenVersionedGossip = versionedGossip seen remoteAddress val newState = localState copy (latestGossip = seenVersionedGossip) @@ -354,7 +354,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (remoteGossip.version <> localGossip.version) { // concurrent val mergedGossip = merge(remoteGossip, localGossip) - val versionedMergedGossip = mergedGossip + selfNode + val versionedMergedGossip = mergedGossip + vclockNode log.debug( "Can't establish a causal relationship between \"remote\" gossip [{}] and \"local\" gossip [{}] - merging them into [{}]", @@ -497,7 +497,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*) val newGossip = localGossip copy (members = newMembersSortedSet) - val versionedGossip = newGossip + selfNode + val versionedGossip = newGossip + vclockNode val seenVersionedGossip = versionedGossip seen remoteAddress val newState = localState copy (self = newSelf, latestGossip = seenVersionedGossip) @@ -556,7 +556,7 @@ case class Gossiper(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val newOverview = localOverview copy (unreachable = newUnreachableAddresses) val newGossip = localGossip copy (overview = newOverview, members = newMembers) - val versionedGossip = newGossip + selfNode + val versionedGossip = newGossip + vclockNode val seenVersionedGossip = versionedGossip seen remoteAddress val newState = localState copy (latestGossip = seenVersionedGossip) diff --git a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala index 8939c4d728..e92b21dbfb 100644 --- a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala @@ -22,19 +22,19 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" } """) with ImplicitSender { - var gossiper1: Gossiper = _ - var gossiper2: Gossiper = _ - var gossiper3: Gossiper = _ + var node1: Node = _ + var node2: Node = _ + var node3: Node = _ - var node1: ActorSystemImpl = _ - var node2: ActorSystemImpl = _ - var node3: ActorSystemImpl = _ + var system1: ActorSystemImpl = _ + var system2: ActorSystemImpl = _ + var system3: ActorSystemImpl = _ try { "A Gossip-driven Failure Detector" must { // ======= NODE 1 ======== - node1 = ActorSystem("node1", ConfigFactory + system1 = ActorSystem("system1", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -45,13 +45,13 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] - gossiper1 = Gossiper(node1, remote1) - val fd1 = gossiper1.failureDetector - val address1 = gossiper1.self.address + val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] + node1 = Node(system1, remote1) + val fd1 = node1.failureDetector + val address1 = node1.self.address // ======= NODE 2 ======== - node2 = ActorSystem("node2", ConfigFactory + system2 = ActorSystem("system2", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -59,17 +59,17 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" hostname = localhost port = 5551 } - cluster.node-to-join = "akka://node1@localhost:5550" + cluster.node-to-join = "akka://system1@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] - gossiper2 = Gossiper(node2, remote2) - val fd2 = gossiper2.failureDetector - val address2 = gossiper2.self.address + val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] + node2 = Node(system2, remote2) + val fd2 = node2.failureDetector + val address2 = node2.self.address // ======= NODE 3 ======== - node3 = ActorSystem("node3", ConfigFactory + system3 = ActorSystem("system3", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -77,17 +77,17 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" hostname = localhost port=5552 } - cluster.node-to-join = "akka://node1@localhost:5550" + cluster.node-to-join = "akka://system1@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote3 = node3.provider.asInstanceOf[RemoteActorRefProvider] - gossiper3 = Gossiper(node3, remote3) - val fd3 = gossiper3.failureDetector - val address3 = gossiper3.self.address + val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] + node3 = Node(system3, remote3) + val fd3 = node3.failureDetector + val address3 = node3.self.address - "receive gossip heartbeats so that all healthy nodes in the cluster are marked 'available'" taggedAs LongRunningTest in { - println("Let the nodes gossip for a while...") + "receive gossip heartbeats so that all healthy systems in the cluster are marked 'available'" taggedAs LongRunningTest in { + println("Let the systems gossip for a while...") Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds fd1.isAvailable(address2) must be(true) fd1.isAvailable(address3) must be(true) @@ -97,12 +97,12 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" fd3.isAvailable(address2) must be(true) } - "mark node as 'unavailable' if a node in the cluster is shut down (and its heartbeats stops)" taggedAs LongRunningTest in { - // shut down node3 - gossiper3.shutdown() + "mark system as 'unavailable' if a system in the cluster is shut down (and its heartbeats stops)" taggedAs LongRunningTest in { + // shut down system3 node3.shutdown() - println("Give the remaning nodes time to detect failure...") - Thread.sleep(30.seconds.dilated.toMillis) // give them 30 seconds to detect failure of node3 + system3.shutdown() + println("Give the remaning systems time to detect failure...") + Thread.sleep(30.seconds.dilated.toMillis) // give them 30 seconds to detect failure of system3 fd1.isAvailable(address2) must be(true) fd1.isAvailable(address3) must be(false) fd2.isAvailable(address1) must be(true) @@ -116,13 +116,13 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" } override def atTermination() { - if (gossiper1 ne null) gossiper1.shutdown() if (node1 ne null) node1.shutdown() + if (system1 ne null) system1.shutdown() - if (gossiper2 ne null) gossiper2.shutdown() if (node2 ne null) node2.shutdown() + if (system2 ne null) system2.shutdown() - if (gossiper3 ne null) gossiper3.shutdown() if (node3 ne null) node3.shutdown() + if (system3 ne null) system3.shutdown() } } diff --git a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala index a82bbe4d5e..197fa22b71 100644 --- a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala @@ -22,18 +22,18 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" } """) with ImplicitSender { - var gossiper0: Gossiper = _ - var gossiper1: Gossiper = _ - var gossiper2: Gossiper = _ + var node0: Node = _ + var node1: Node = _ + var node2: Node = _ - var node0: ActorSystemImpl = _ - var node1: ActorSystemImpl = _ - var node2: ActorSystemImpl = _ + var system0: ActorSystemImpl = _ + var system1: ActorSystemImpl = _ + var system2: ActorSystemImpl = _ try { - "A set of connected cluster nodes" must { - "(when two nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { - node0 = ActorSystem("node0", ConfigFactory + "A set of connected cluster systems" must { + "(when two systems) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { + system0 = ActorSystem("system0", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -44,10 +44,10 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] - gossiper0 = Gossiper(node0, remote0) + val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] + node0 = Node(system0, remote0) - node1 = ActorSystem("node1", ConfigFactory + system1 = ActorSystem("system1", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -55,21 +55,21 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" hostname = localhost port=5551 } - cluster.node-to-join = "akka://node0@localhost:5550" + cluster.node-to-join = "akka://system0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] - gossiper1 = Gossiper(node1, remote1) + val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] + node1 = Node(system1, remote1) val latch = new CountDownLatch(2) - gossiper0.registerListener(new MembershipChangeListener { + node0.registerListener(new MembershipChangeListener { def notify(members: SortedSet[Member]) { latch.countDown() } }) - gossiper1.registerListener(new MembershipChangeListener { + node1.registerListener(new MembershipChangeListener { def notify(members: SortedSet[Member]) { latch.countDown() } @@ -80,14 +80,14 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" Thread.sleep(10.seconds.dilated.toMillis) // check cluster convergence - gossiper0.convergence must be('defined) - gossiper1.convergence must be('defined) + node0.convergence must be('defined) + node1.convergence must be('defined) } - "(when three nodes) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { + "(when three systems) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { // ======= NODE 2 ======== - node2 = ActorSystem("node2", ConfigFactory + system2 = ActorSystem("system2", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -95,25 +95,25 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" hostname = localhost port=5552 } - cluster.node-to-join = "akka://node0@localhost:5550" + cluster.node-to-join = "akka://system0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] - gossiper2 = Gossiper(node2, remote2) + val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] + node2 = Node(system2, remote2) val latch = new CountDownLatch(3) - gossiper0.registerListener(new MembershipChangeListener { + node0.registerListener(new MembershipChangeListener { def notify(members: SortedSet[Member]) { latch.countDown() } }) - gossiper1.registerListener(new MembershipChangeListener { + node1.registerListener(new MembershipChangeListener { def notify(members: SortedSet[Member]) { latch.countDown() } }) - gossiper2.registerListener(new MembershipChangeListener { + node2.registerListener(new MembershipChangeListener { def notify(members: SortedSet[Member]) { latch.countDown() } @@ -124,9 +124,9 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" Thread.sleep(10.seconds.dilated.toMillis) // check cluster convergence - gossiper0.convergence must be('defined) - gossiper1.convergence must be('defined) - gossiper2.convergence must be('defined) + node0.convergence must be('defined) + node1.convergence must be('defined) + node2.convergence must be('defined) } } } catch { @@ -136,13 +136,13 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" } override def atTermination() { - if (gossiper0 ne null) gossiper0.shutdown() if (node0 ne null) node0.shutdown() + if (system0 ne null) system0.shutdown() - if (gossiper1 ne null) gossiper1.shutdown() if (node1 ne null) node1.shutdown() + if (system1 ne null) system1.shutdown() - if (gossiper2 ne null) gossiper2.shutdown() if (node2 ne null) node2.shutdown() + if (system2 ne null) system2.shutdown() } } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index 2ce0a1d449..dc24485507 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -19,20 +19,20 @@ class NodeMembershipSpec extends AkkaSpec(""" } """) with ImplicitSender { - var gossiper0: Gossiper = _ - var gossiper1: Gossiper = _ - var gossiper2: Gossiper = _ + var node0: Node = _ + var node1: Node = _ + var node2: Node = _ - var node0: ActorSystemImpl = _ - var node1: ActorSystemImpl = _ - var node2: ActorSystemImpl = _ + var system0: ActorSystemImpl = _ + var system1: ActorSystemImpl = _ + var system2: ActorSystemImpl = _ try { - "A set of connected cluster nodes" must { - "(when two nodes) start gossiping to each other so that both nodes gets the same gossip info" taggedAs LongRunningTest in { + "A set of connected cluster systems" must { + "(when two systems) start gossiping to each other so that both systems gets the same gossip info" taggedAs LongRunningTest in { // ======= NODE 0 ======== - node0 = ActorSystem("node0", ConfigFactory + system0 = ActorSystem("system0", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -43,11 +43,11 @@ class NodeMembershipSpec extends AkkaSpec(""" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] - gossiper0 = Gossiper(node0, remote0) + val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] + node0 = Node(system0, remote0) // ======= NODE 1 ======== - node1 = ActorSystem("node1", ConfigFactory + system1 = ActorSystem("system1", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -55,27 +55,27 @@ class NodeMembershipSpec extends AkkaSpec(""" hostname = localhost port=5551 } - cluster.node-to-join = "akka://node0@localhost:5550" + cluster.node-to-join = "akka://system0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] - gossiper1 = Gossiper(node1, remote1) + val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] + node1 = Node(system1, remote1) Thread.sleep(10.seconds.dilated.toMillis) // check cluster convergence - gossiper0.convergence must be('defined) - gossiper1.convergence must be('defined) + node0.convergence must be('defined) + node1.convergence must be('defined) - val members0 = gossiper0.latestGossip.members.toArray + val members0 = node0.latestGossip.members.toArray members0.size must be(2) members0(0).address.port.get must be(5550) members0(0).status must be(MemberStatus.Joining) members0(1).address.port.get must be(5551) members0(1).status must be(MemberStatus.Joining) - val members1 = gossiper1.latestGossip.members.toArray + val members1 = node1.latestGossip.members.toArray members1.size must be(2) members1(0).address.port.get must be(5550) members1(0).status must be(MemberStatus.Joining) @@ -83,10 +83,10 @@ class NodeMembershipSpec extends AkkaSpec(""" members1(1).status must be(MemberStatus.Joining) } - "(when three nodes) start gossiping to each other so that both nodes gets the same gossip info" taggedAs LongRunningTest in { + "(when three systems) start gossiping to each other so that both systems gets the same gossip info" taggedAs LongRunningTest in { // ======= NODE 2 ======== - node2 = ActorSystem("node2", ConfigFactory + system2 = ActorSystem("system2", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -94,22 +94,22 @@ class NodeMembershipSpec extends AkkaSpec(""" hostname = localhost port=5552 } - cluster.node-to-join = "akka://node0@localhost:5550" + cluster.node-to-join = "akka://system0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote2 = node2.provider.asInstanceOf[RemoteActorRefProvider] - gossiper2 = Gossiper(node2, remote2) + val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] + node2 = Node(system2, remote2) Thread.sleep(10.seconds.dilated.toMillis) // check cluster convergence - gossiper0.convergence must be('defined) - gossiper1.convergence must be('defined) - gossiper2.convergence must be('defined) + node0.convergence must be('defined) + node1.convergence must be('defined) + node2.convergence must be('defined) - val members0 = gossiper0.latestGossip.members.toArray - val version = gossiper0.latestGossip.version + val members0 = node0.latestGossip.members.toArray + val version = node0.latestGossip.version members0.size must be(3) members0(0).address.port.get must be(5550) members0(0).status must be(MemberStatus.Joining) @@ -118,7 +118,7 @@ class NodeMembershipSpec extends AkkaSpec(""" members0(2).address.port.get must be(5552) members0(2).status must be(MemberStatus.Joining) - val members1 = gossiper1.latestGossip.members.toArray + val members1 = node1.latestGossip.members.toArray members1.size must be(3) members1(0).address.port.get must be(5550) members1(0).status must be(MemberStatus.Joining) @@ -127,7 +127,7 @@ class NodeMembershipSpec extends AkkaSpec(""" members1(2).address.port.get must be(5552) members1(2).status must be(MemberStatus.Joining) - val members2 = gossiper2.latestGossip.members.toArray + val members2 = node2.latestGossip.members.toArray members2.size must be(3) members2(0).address.port.get must be(5550) members2(0).status must be(MemberStatus.Joining) @@ -144,13 +144,13 @@ class NodeMembershipSpec extends AkkaSpec(""" } override def atTermination() { - if (gossiper0 ne null) gossiper0.shutdown() if (node0 ne null) node0.shutdown() + if (system0 ne null) system0.shutdown() - if (gossiper1 ne null) gossiper1.shutdown() if (node1 ne null) node1.shutdown() + if (system1 ne null) system1.shutdown() - if (gossiper2 ne null) gossiper2.shutdown() if (node2 ne null) node2.shutdown() + if (system2 ne null) system2.shutdown() } } diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index b3805b7946..3d98260c4d 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -19,14 +19,14 @@ class NodeStartupSpec extends AkkaSpec(""" } """) with ImplicitSender { - var gossiper0: Gossiper = _ - var gossiper1: Gossiper = _ - var node0: ActorSystemImpl = _ - var node1: ActorSystemImpl = _ + var node0: Node = _ + var node1: Node = _ + var system0: ActorSystemImpl = _ + var system1: ActorSystemImpl = _ try { "A first cluster node with a 'node-to-join' config set to empty string (singleton cluster)" must { - node0 = ActorSystem("NodeStartupSpec", ConfigFactory + system0 = ActorSystem("NodeStartupSpec", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -37,16 +37,16 @@ class NodeStartupSpec extends AkkaSpec(""" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote0 = node0.provider.asInstanceOf[RemoteActorRefProvider] - gossiper0 = Gossiper(node0, remote0) + val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] + node0 = Node(system0, remote0) "be a singleton cluster when started up" in { Thread.sleep(1.seconds.dilated.toMillis) - gossiper0.isSingletonCluster must be(true) + node0.isSingletonCluster must be(true) } "be in 'Up' phase when started up" in { - val members = gossiper0.latestGossip.members + val members = node0.latestGossip.members val joiningMember = members find (_.address.port.get == 5550) joiningMember must be('defined) joiningMember.get.status must be(MemberStatus.Joining) @@ -55,7 +55,7 @@ class NodeStartupSpec extends AkkaSpec(""" "A second cluster node with a 'node-to-join' config defined" must { "join the other node cluster as 'Joining' when sending a Join command" in { - node1 = ActorSystem("NodeStartupSpec", ConfigFactory + system1 = ActorSystem("NodeStartupSpec", ConfigFactory .parseString(""" akka { actor.provider = "akka.remote.RemoteActorRefProvider" @@ -67,11 +67,11 @@ class NodeStartupSpec extends AkkaSpec(""" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] - val remote1 = node1.provider.asInstanceOf[RemoteActorRefProvider] - gossiper1 = Gossiper(node1, remote1) + val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] + node1 = Node(system1, remote1) Thread.sleep(1.seconds.dilated.toMillis) // give enough time for node1 to JOIN node0 - val members = gossiper0.latestGossip.members + val members = node0.latestGossip.members val joiningMember = members find (_.address.port.get == 5551) joiningMember must be('defined) joiningMember.get.status must be(MemberStatus.Joining) @@ -84,10 +84,10 @@ class NodeStartupSpec extends AkkaSpec(""" } override def atTermination() { - if (gossiper0 ne null) gossiper0.shutdown() if (node0 ne null) node0.shutdown() + if (system0 ne null) system0.shutdown() - if (gossiper1 ne null) gossiper1.shutdown() if (node1 ne null) node1.shutdown() + if (system1 ne null) system1.shutdown() } } From 83c97d08da714718eb33a2f959fdded6b8879b12 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Mon, 20 Feb 2012 17:22:07 +0100 Subject: [PATCH 56/72] Added support for "leader election", the isLeader method and leader election tests. Also fixed bug in scrutinizer not maintaining the 'seen' map. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/Node.scala | 13 +- .../akka/cluster/LeaderElectionSpec.scala | 155 ++++++++++++++++++ 2 files changed, 167 insertions(+), 1 deletion(-) create mode 100644 akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index 0eaa6b1d16..bcb9d1ecbc 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -282,6 +282,14 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ def self: Member = state.get.self + /** + * Is this node the leader? + */ + def isLeader: Boolean = { + val currentState = state.get + remoteAddress == currentState.latestGossip.members.head.address + } + /** * Is this node a singleton cluster? */ @@ -540,6 +548,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val localGossip = localState.latestGossip val localOverview = localGossip.overview + val localSeen = localOverview.seen val localMembers = localGossip.members val localUnreachableAddresses = localGossip.overview.unreachable @@ -553,7 +562,9 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { log.info("Node [{}] - Marking node(s) an unreachable [{}]", remoteAddress, newlyDetectedUnreachableAddresses.mkString(", ")) - val newOverview = localOverview copy (unreachable = newUnreachableAddresses) + val newSeen = newUnreachableAddresses.foldLeft(localSeen)((currentSeen, address) ⇒ currentSeen - address) + + val newOverview = localOverview copy (seen = newSeen, unreachable = newUnreachableAddresses) val newGossip = localGossip copy (overview = newOverview, members = newMembers) val versionedGossip = newGossip + vclockNode diff --git a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala new file mode 100644 index 0000000000..dc0d8632a1 --- /dev/null +++ b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala @@ -0,0 +1,155 @@ +/** + * Copyright (C) 2009-2012 Typesafe Inc. + */ +package akka.cluster + +import akka.testkit._ +import akka.dispatch._ +import akka.actor._ +import akka.remote._ +import akka.util.duration._ + +import com.typesafe.config._ + +import java.net.InetSocketAddress + +class LeaderElectionSpec extends AkkaSpec(""" + akka { + loglevel = "DEBUG" + actor.debug.lifecycle = on + actor.debug.autoreceive = on + cluster.failure-detector.threshold = 3 + } + """) with ImplicitSender { + + var node1: Node = _ + var node2: Node = _ + var node3: Node = _ + + var system1: ActorSystemImpl = _ + var system2: ActorSystemImpl = _ + var system3: ActorSystemImpl = _ + + try { + "A cluster of three nodes" must { + + // ======= NODE 1 ======== + system1 = ActorSystem("system1", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] + node1 = Node(system1, remote1) + val fd1 = node1.failureDetector + val address1 = node1.self.address + + // ======= NODE 2 ======== + system2 = ActorSystem("system2", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port = 5551 + } + cluster.node-to-join = "akka://system1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] + node2 = Node(system2, remote2) + val fd2 = node2.failureDetector + val address2 = node2.self.address + + // ======= NODE 3 ======== + system3 = ActorSystem("system3", ConfigFactory + .parseString(""" + akka { + actor.provider = "akka.remote.RemoteActorRefProvider" + remote.netty { + hostname = localhost + port=5552 + } + cluster.node-to-join = "akka://system1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] + node3 = Node(system3, remote3) + val fd3 = node3.failureDetector + val address3 = node3.self.address + + "be able to 'elect' a single leader" taggedAs LongRunningTest in { + + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds + + // check cluster convergence + node1.convergence must be('defined) + node2.convergence must be('defined) + node3.convergence must be('defined) + + // check leader + node1.isLeader must be(true) + node2.isLeader must be(false) + node3.isLeader must be(false) + } + + "be able to 're-elect' a single leader after leader has left" taggedAs LongRunningTest in { + + // shut down system1 - the leader + node1.shutdown() + system1.shutdown() + + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) // give them 30 seconds to detect failure of system3 + + // check cluster convergence + node2.convergence must be('defined) + node3.convergence must be('defined) + + // check leader + node2.isLeader must be(true) + node3.isLeader must be(false) + } + + "be able to 're-elect' a single leader after leader has left (again, leaving a single node)" taggedAs LongRunningTest in { + + // shut down system1 - the leader + node2.shutdown() + system2.shutdown() + + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) // give them 30 seconds to detect failure of system3 + + // check cluster convergence + node3.convergence must be('defined) + + // check leader + node3.isLeader must be(true) + } + } + } catch { + case e: Exception ⇒ + e.printStackTrace + fail(e.toString) + } + + override def atTermination() { + if (node1 ne null) node1.shutdown() + if (system1 ne null) system1.shutdown() + + if (node2 ne null) node2.shutdown() + if (system2 ne null) system2.shutdown() + + if (node3 ne null) node3.shutdown() + if (system3 ne null) system3.shutdown() + } +} From e4b1d8609ff164a27d5af7e4b41befab8d18f409 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sat, 1 Jan 2011 01:50:33 +0100 Subject: [PATCH 57/72] Added support for 'deputy-nodes'. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Added 'nr-of-deputy-nodes' config option * Added fetching of current deputy node addresses * Minor refactorings Signed-off-by: Jonas Bonér --- .../src/main/resources/reference.conf | 1 + .../scala/akka/cluster/ClusterSettings.scala | 1 + .../src/main/scala/akka/cluster/Node.scala | 87 +++++++++---------- .../akka/cluster/ClusterConfigSpec.scala | 1 + 4 files changed, 44 insertions(+), 46 deletions(-) diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index feada91c01..0917909504 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -14,6 +14,7 @@ akka { # the number of gossip daemon actors nr-of-gossip-daemons = 4 + nr-of-deputy-nodes = 3 gossip { initialDelay = 5s diff --git a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala index 9872f3e233..10f0316476 100644 --- a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala +++ b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala @@ -22,4 +22,5 @@ class ClusterSettings(val config: Config, val systemName: String) { val GossipInitialDelay = Duration(getMilliseconds("akka.cluster.gossip.initialDelay"), MILLISECONDS) val GossipFrequency = Duration(getMilliseconds("akka.cluster.gossip.frequency"), MILLISECONDS) val NrOfGossipDaemons = getInt("akka.cluster.nr-of-gossip-daemons") + val NrOfDeputyNodes = getInt("akka.cluster.nr-of-deputy-nodes") } diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index bcb9d1ecbc..bb8bec4d31 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -184,7 +184,6 @@ final class ClusterGossipDaemon(system: ActorSystem, node: Node) extends Actor { } // FIXME Cluster public API should be an Extension -// FIXME Add cluster Node class and refactor out all non-gossip related stuff out of Node /** * This module is responsible for Gossiping cluster information. The abstraction maintains the list of live @@ -228,6 +227,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { val failureDetector = new AccrualFailureDetector( system, remoteAddress, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) + private val nrOfDeputyNodes = clusterSettings.NrOfDeputyNodes private val nrOfGossipDaemons = clusterSettings.NrOfGossipDaemons private val nodeToJoin: Option[Address] = clusterSettings.NodeToJoin filter (_ != remoteAddress) @@ -237,8 +237,6 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private val log = Logging(system, "Node") private val random = SecureRandom.getInstance("SHA1PRNG") - // Is it right to put this guy under the /system path or should we have a top-level /cluster or something else...? - // FIXME should be defined as a router so we get concurrency here private val clusterCommandDaemon = system.systemActorOf( Props(new ClusterCommandDaemon(system, this)), "clusterCommand") @@ -259,12 +257,12 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { join() // start periodic gossip to random nodes in cluster - val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { + private val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { gossip() } // start periodic cluster scrutinization (moving nodes condemned by the failure detector to unreachable list) - val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { + private val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { scrutinize() } @@ -295,6 +293,13 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ def isSingletonCluster: Boolean = isSingletonCluster(state.get) + /** + * Checks if we have a cluster convergence. + * + * @returns Some(convergedGossip) if convergence have been reached and None if not + */ + def convergence: Option[Gossip] = convergence(latestGossip) + /** * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. */ @@ -317,11 +322,37 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } } + /** + * Registers a listener to subscribe to cluster membership changes. + */ + @tailrec + final def registerListener(listener: MembershipChangeListener) { + val localState = state.get + val newListeners = localState.memberMembershipChangeListeners + listener + val newState = localState copy (memberMembershipChangeListeners = newListeners) + if (!state.compareAndSet(localState, newState)) registerListener(listener) // recur + } + + /** + * Unsubscribes to cluster membership changes. + */ + @tailrec + final def unregisterListener(listener: MembershipChangeListener) { + val localState = state.get + val newListeners = localState.memberMembershipChangeListeners - listener + val newState = localState copy (memberMembershipChangeListeners = newListeners) + if (!state.compareAndSet(localState, newState)) unregisterListener(listener) // recur + } + + // ======================================================== + // ===================== INTERNAL API ===================== + // ======================================================== + /** * New node joining. */ @tailrec - final def joining(node: Address) { + private[cluster] final def joining(node: Address) { log.info("Node [{}] - Node [{}] is joining", remoteAddress, node) failureDetector heartbeat node // update heartbeat in failure detector @@ -350,7 +381,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { * Receive new gossip. */ @tailrec - final def receive(sender: Member, remoteGossip: Gossip) { + private[cluster] final def receive(sender: Member, remoteGossip: Gossip) { log.debug("Node [{}] - Receiving gossip from [{}]", remoteAddress, sender.address) failureDetector heartbeat sender.address // update heartbeat in failure detector @@ -390,39 +421,6 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } } - /** - * Registers a listener to subscribe to cluster membership changes. - */ - @tailrec - final def registerListener(listener: MembershipChangeListener) { - val localState = state.get - val newListeners = localState.memberMembershipChangeListeners + listener - val newState = localState copy (memberMembershipChangeListeners = newListeners) - if (!state.compareAndSet(localState, newState)) registerListener(listener) // recur - } - - /** - * Unsubscribes to cluster membership changes. - */ - @tailrec - final def unregisterListener(listener: MembershipChangeListener) { - val localState = state.get - val newListeners = localState.memberMembershipChangeListeners - listener - val newState = localState copy (memberMembershipChangeListeners = newListeners) - if (!state.compareAndSet(localState, newState)) unregisterListener(listener) // recur - } - - /** - * Checks if we have a cluster convergence. - * - * @returns Some(convergedGossip) if convergence have been reached and None if not - */ - def convergence: Option[Gossip] = convergence(latestGossip) - - // ======================================================== - // ===================== INTERNAL API ===================== - // ======================================================== - /** * Joins the pre-configured contact point and retrieves current gossip state. */ @@ -461,7 +459,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { } // 3. gossip to a deputy nodes for facilitating partition healing - val deputies = deputyNodesWithoutMyself + val deputies = deputyNodes if ((!gossipedToDeputy || localMembersSize < 1) && !deputies.isEmpty) { if (localMembersSize == 0) gossipToRandomNodeOf(deputies) else { @@ -530,11 +528,8 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { private def gossipToRandomNodeOf(addresses: Seq[Address]): Boolean = { val peers = addresses filter (_ != remoteAddress) // filter out myself val peer = selectRandomNode(peers) - val localState = state.get - val localGossip = localState.latestGossip - // if connection can't be established/found => ignore it since the failure detector will take care of the potential problem gossipTo(peer) - deputyNodesWithoutMyself exists (peer == _) + deputyNodes exists (peer == _) } /** @@ -607,7 +602,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { */ private def clusterGossipConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "clusterGossip") - private def deputyNodesWithoutMyself: Seq[Address] = Seq.empty[Address] filter (_ != remoteAddress) // FIXME read in deputy nodes from gossip data - now empty seq + private def deputyNodes: Seq[Address] = state.get.latestGossip.members.toSeq map (_.address) drop 1 take nrOfDeputyNodes filter (_ != remoteAddress) private def selectRandomNode(addresses: Seq[Address]): Address = addresses(random nextInt addresses.size) diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 2afbc7efc0..6668044f33 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -29,6 +29,7 @@ class ClusterConfigSpec extends AkkaSpec( GossipInitialDelay must be(5 seconds) GossipFrequency must be(1 second) NrOfGossipDaemons must be(4) + NrOfDeputyNodes must be(3) } } } From a62755c5daefaad3838ff1e552ba0f72877b218e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 22 Feb 2012 18:40:16 +0100 Subject: [PATCH 58/72] Turned cluster Node into an Extension. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/Node.scala | 32 ++++++++++++++++--- .../GossipingAccrualFailureDetectorSpec.scala | 6 ++-- .../akka/cluster/LeaderElectionSpec.scala | 6 ++-- .../MembershipChangeListenerSpec.scala | 6 ++-- .../akka/cluster/NodeMembershipSpec.scala | 6 ++-- .../scala/akka/cluster/NodeStartupSpec.scala | 4 +-- 6 files changed, 41 insertions(+), 19 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index bb8bec4d31..33c1ad840e 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -170,6 +170,8 @@ final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor } } +// FIXME create package object with implicit conversion that enables: system.node + /** * Pooled and routed wit N number of configurable instances. * Concurrent access to Node. @@ -183,7 +185,22 @@ final class ClusterGossipDaemon(system: ActorSystem, node: Node) extends Actor { } } -// FIXME Cluster public API should be an Extension +/** + * Node Extension Id and factory for creating Node extension. + * Example: + * {{{ + * val node = NodeExtension(system) + * + * if (node.isLeader) { ... } + * }}} + */ +object NodeExtension extends ExtensionId[Node] with ExtensionIdProvider { + override def get(system: ActorSystem): Node = super.get(system) + + override def lookup = NodeExtension + + override def createExtension(system: ExtendedActorSystem): Node = new Node(system.asInstanceOf[ActorSystemImpl]) // not nice but need API in ActorSystemImpl inside Node +} /** * This module is responsible for Gossiping cluster information. The abstraction maintains the list of live @@ -200,7 +217,12 @@ final class ClusterGossipDaemon(system: ActorSystem, node: Node) extends Actor { * gossip to random deputy with certain probability depending on number of unreachable, deputy and live members. * */ -case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { +class Node(system: ActorSystemImpl) extends Extension { + + if (!system.provider.isInstanceOf[RemoteActorRefProvider]) + throw new ConfigurationException("ActorSystem[" + system + "] needs to have a 'RemoteActorRefProvider' enabled in the configuration") + + val remote: RemoteActorRefProvider = system.provider.asInstanceOf[RemoteActorRefProvider] /** * Represents the state for this Node. Implemented using optimistic lockless concurrency, @@ -372,7 +394,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (!state.compareAndSet(localState, newState)) joining(node) // recur if we failed update else { if (convergence(newState.latestGossip).isDefined) { - newState.memberMembershipChangeListeners map { _ notify newMembers } // FIXME should check for cluster convergence before triggering listeners + newState.memberMembershipChangeListeners map { _ notify newMembers } } } } @@ -416,7 +438,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (!state.compareAndSet(localState, newState)) receive(sender, remoteGossip) // recur if we fail the update else { if (convergence(newState.latestGossip).isDefined) { - newState.memberMembershipChangeListeners map { _ notify newState.latestGossip.members } // FIXME should check for cluster convergence before triggering listeners + newState.memberMembershipChangeListeners map { _ notify newState.latestGossip.members } } } } @@ -571,7 +593,7 @@ case class Node(system: ActorSystemImpl, remote: RemoteActorRefProvider) { if (!state.compareAndSet(localState, newState)) scrutinize() // recur else { if (convergence(newState.latestGossip).isDefined) { - newState.memberMembershipChangeListeners map { _ notify newMembers } // FIXME should check for cluster convergence before triggering listeners + newState.memberMembershipChangeListeners map { _ notify newMembers } } } } diff --git a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala index e92b21dbfb..f4deb77706 100644 --- a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala @@ -46,7 +46,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = Node(system1, remote1) + node1 = new Node(system1) val fd1 = node1.failureDetector val address1 = node1.self.address @@ -64,7 +64,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = Node(system2, remote2) + node2 = new Node(system2) val fd2 = node2.failureDetector val address2 = node2.self.address @@ -82,7 +82,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] - node3 = Node(system3, remote3) + node3 = new Node(system3) val fd3 = node3.failureDetector val address3 = node3.self.address diff --git a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala index dc0d8632a1..e3da64cfa0 100644 --- a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala @@ -46,7 +46,7 @@ class LeaderElectionSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = Node(system1, remote1) + node1 = new Node(system1) val fd1 = node1.failureDetector val address1 = node1.self.address @@ -64,7 +64,7 @@ class LeaderElectionSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = Node(system2, remote2) + node2 = new Node(system2) val fd2 = node2.failureDetector val address2 = node2.self.address @@ -82,7 +82,7 @@ class LeaderElectionSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] - node3 = Node(system3, remote3) + node3 = new Node(system3) val fd3 = node3.failureDetector val address3 = node3.self.address diff --git a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala index 197fa22b71..e2487de3c8 100644 --- a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala @@ -45,7 +45,7 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] - node0 = Node(system0, remote0) + node0 = new Node(system0) system1 = ActorSystem("system1", ConfigFactory .parseString(""" @@ -60,7 +60,7 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = Node(system1, remote1) + node1 = new Node(system1) val latch = new CountDownLatch(2) @@ -100,7 +100,7 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = Node(system2, remote2) + node2 = new Node(system2) val latch = new CountDownLatch(3) node0.registerListener(new MembershipChangeListener { diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index dc24485507..56f053fc6c 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -44,7 +44,7 @@ class NodeMembershipSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] - node0 = Node(system0, remote0) + node0 = new Node(system0) // ======= NODE 1 ======== system1 = ActorSystem("system1", ConfigFactory @@ -60,7 +60,7 @@ class NodeMembershipSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = Node(system1, remote1) + node1 = new Node(system1) Thread.sleep(10.seconds.dilated.toMillis) @@ -99,7 +99,7 @@ class NodeMembershipSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = Node(system2, remote2) + node2 = new Node(system2) Thread.sleep(10.seconds.dilated.toMillis) diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index 3d98260c4d..ed4b893619 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -38,7 +38,7 @@ class NodeStartupSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] - node0 = Node(system0, remote0) + node0 = new Node(system0) "be a singleton cluster when started up" in { Thread.sleep(1.seconds.dilated.toMillis) @@ -68,7 +68,7 @@ class NodeStartupSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = Node(system1, remote1) + node1 = new Node(system1) Thread.sleep(1.seconds.dilated.toMillis) // give enough time for node1 to JOIN node0 val members = node0.latestGossip.members From da5a5d1316aa5ed7072340593f32413ba9dfc177 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 28 Feb 2012 10:53:28 +0100 Subject: [PATCH 59/72] Added ensime files to .gitignore. Plus fixed error from merge. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .gitignore | 3 +-- .../src/main/scala/akka/cluster/ClusterSettings.scala | 4 ++-- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/.gitignore b/.gitignore index 25d3fa2323..7d8723dcb9 100755 --- a/.gitignore +++ b/.gitignore @@ -24,8 +24,7 @@ logs .#* .codefellow storage -.codefellow -.ensime +.ensime* _dump .manager manifest.mf diff --git a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala index 10f0316476..e05c04b9d7 100644 --- a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala +++ b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala @@ -16,8 +16,8 @@ class ClusterSettings(val config: Config, val systemName: String) { val FailureDetectorThreshold = getInt("akka.cluster.failure-detector.threshold") val FailureDetectorMaxSampleSize = getInt("akka.cluster.failure-detector.max-sample-size") val NodeToJoin: Option[Address] = getString("akka.cluster.node-to-join") match { - case "" ⇒ None - case AddressExtractor(addr) ⇒ Some(addr) + case "" ⇒ None + case AddressFromURIString(addr) ⇒ Some(addr) } val GossipInitialDelay = Duration(getMilliseconds("akka.cluster.gossip.initialDelay"), MILLISECONDS) val GossipFrequency = Duration(getMilliseconds("akka.cluster.gossip.frequency"), MILLISECONDS) From 517981101f23075fcf2349d185957e16fd106bb5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 28 Feb 2012 11:46:37 +0100 Subject: [PATCH 60/72] Added docs about how to enable 'long-running' and 'timing' tests. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- akka-docs/dev/building-akka.rst | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/akka-docs/dev/building-akka.rst b/akka-docs/dev/building-akka.rst index 495d8ba5ed..c399f47856 100644 --- a/akka-docs/dev/building-akka.rst +++ b/akka-docs/dev/building-akka.rst @@ -12,7 +12,7 @@ This page describes how to build and run Akka from the latest source code. .. contents:: :local: -Get the source code +Get the Source Code =================== Akka uses `Git`_ and is hosted at `Github`_. @@ -84,7 +84,20 @@ launch script to activate parallel execution:: -Dakka.parallelExecution=true -Publish to local Ivy repository +Long Running and Time Sensitive Tests +------------------------------------- + +By default are the long running tests (mainly cluster tests) and time sensitive tests (dependent on the +performance of the machine it is running on) disabled. You can enable them by adding one of the flags:: + + -Dakka.test.tags.include=long-running + -Dakka.test.tags.include=timing + +Or if you need to enable them both:: + + -Dakka.test.tags.include=long-running,timing + +Publish to Local Ivy Repository ------------------------------- If you want to deploy the artifacts to your local Ivy repository (for example, @@ -93,7 +106,7 @@ to use from an sbt project) use the ``publish-local`` command:: sbt publish-local -sbt interactive mode +sbt Interactive Mode -------------------- Note that in the examples above we are calling ``sbt compile`` and ``sbt test`` @@ -113,7 +126,7 @@ For example, building Akka as above is more commonly done like this:: ... -sbt batch mode +sbt Batch Mode -------------- It's also possible to combine commands in a single call. For example, testing, From 96ed8bdccf8fa14645cac8f3a69b191fc01c6e8e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 28 Feb 2012 12:57:14 +0100 Subject: [PATCH 61/72] Added 'akka.cluster' package object with implicit conversion which creates an augmented 'ActorSystem' with a method 'def node: Node'. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/Node.scala | 2 -- .../src/main/scala/akka/package.scala | 20 +++++++++++++++++++ .../cluster/AccrualFailureDetectorSpec.scala | 2 +- .../akka/cluster/LeaderElectionSpec.scala | 2 +- .../akka/cluster/NodeMembershipSpec.scala | 6 +++--- 5 files changed, 25 insertions(+), 7 deletions(-) create mode 100644 akka-cluster/src/main/scala/akka/package.scala diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index 33c1ad840e..0f324936f3 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -170,8 +170,6 @@ final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor } } -// FIXME create package object with implicit conversion that enables: system.node - /** * Pooled and routed wit N number of configurable instances. * Concurrent access to Node. diff --git a/akka-cluster/src/main/scala/akka/package.scala b/akka-cluster/src/main/scala/akka/package.scala new file mode 100644 index 0000000000..993a12c64f --- /dev/null +++ b/akka-cluster/src/main/scala/akka/package.scala @@ -0,0 +1,20 @@ +/** + * Copyright (C) 2009-2012 Typesafe Inc. + */ + +package akka + +import akka.actor.ActorSystem + +package object cluster { + + /** + * Implicitly creates an augmented [[akka.actor.ActorSystem]] with a method {{{def node: Node}}}. + * + * @param system + * @return An augmented [[akka.actor.ActorSystem]] with a method {{{def node: Node}}}. + */ + implicit def actorSystemWithNodeAccessor(system: ActorSystem) = new { + val node = NodeExtension(system) + } +} diff --git a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala index 2e00c72ad1..e867bc834b 100644 --- a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala @@ -5,7 +5,7 @@ import akka.testkit.AkkaSpec import akka.actor.Address class AccrualFailureDetectorSpec extends AkkaSpec(""" - akka.loglevel = "DEBUG" + akka.loglevel = "INFO" """) { "An AccrualFailureDetector" must { diff --git a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala index e3da64cfa0..85587f8780 100644 --- a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala @@ -15,7 +15,7 @@ import java.net.InetSocketAddress class LeaderElectionSpec extends AkkaSpec(""" akka { - loglevel = "DEBUG" + loglevel = "INFO" actor.debug.lifecycle = on actor.debug.autoreceive = on cluster.failure-detector.threshold = 3 diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index 56f053fc6c..ddd773dce8 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -44,7 +44,7 @@ class NodeMembershipSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] - node0 = new Node(system0) + node0 = system0.node // ======= NODE 1 ======== system1 = ActorSystem("system1", ConfigFactory @@ -60,7 +60,7 @@ class NodeMembershipSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = new Node(system1) + node1 = system1.node Thread.sleep(10.seconds.dilated.toMillis) @@ -99,7 +99,7 @@ class NodeMembershipSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = new Node(system2) + node2 = system2.node Thread.sleep(10.seconds.dilated.toMillis) From e91af31fb99c184e84becd364bb36a57a07c4f37 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Tue, 28 Feb 2012 17:04:48 +0100 Subject: [PATCH 62/72] Added FSM to the Node's ClusterCommandDaemon to manage the cluster command state as an FSM. Also added tests for all the FSM state changes. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/Node.scala | 193 ++++++++++++++---- .../cluster/AccrualFailureDetectorSpec.scala | 4 + .../cluster/ClusterCommandDaemonFSMSpec.scala | 156 ++++++++++++++ .../akka/cluster/LeaderElectionSpec.scala | 1 + .../scala/akka/cluster/VectorClockSpec.scala | 4 + akka-docs/scala/fsm.rst | 4 +- 6 files changed, 325 insertions(+), 37 deletions(-) create mode 100644 akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index 0f324936f3..cea128d027 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -46,24 +46,48 @@ trait MetaDataChangeListener { // FIXME add management and notification for Meta sealed trait ClusterMessage extends Serializable /** - * Command to join the cluster. + * Cluster commands sent by the USER. */ -case class Join(node: Address) extends ClusterMessage +object UserAction { + + /** + * Command to join the cluster. Sent when a node (reprsesented by 'address') + * wants to join another node (the receiver). + */ + case class Join(address: Address) extends ClusterMessage + + /** + * Command to leave the cluster. + */ + case object Leave extends ClusterMessage + + /** + * Command to mark node as temporary down. + */ + case object Down extends ClusterMessage + + /** + * Command to mark a node to be removed from the cluster immediately. + */ + case object Exit extends ClusterMessage +} /** - * Command to leave the cluster. + * Cluster commands sent by the LEADER. + * Node: Leader can also send UserActions but not vice versa. */ -case class Leave(node: Address) extends ClusterMessage +object LeaderAction { -/** - * Command to mark node as temporay down. - */ -case class Down(node: Address) extends ClusterMessage + /** + * Command to set a node to Up (from Joining). + */ + case object Up extends ClusterMessage -/** - * Command to remove a node from the cluster immediately. - */ -case class Remove(node: Address) extends ClusterMessage + /** + * Command to remove a node from the cluster immediately. + */ + case object Remove extends ClusterMessage +} /** * Represents the address and the current status of a cluster member node. @@ -87,6 +111,7 @@ object MemberStatus { case object Leaving extends MemberStatus case object Exiting extends MemberStatus case object Down extends MemberStatus + case object Removed extends MemberStatus } // sealed trait PartitioningStatus @@ -153,20 +178,92 @@ case class Gossip( ")" } -// FIXME ClusterCommandDaemon with FSM trait /** - * Single instance. FSM managing the different cluster nodes states. - * Serialized access to Node. + * FSM actor managing the different cluster nodes states. + * Single instance - e.g. serialized access to Node - message after message. */ -final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor { - val log = Logging(system, "ClusterCommandDaemon") +final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor with FSM[MemberStatus, Unit] { - def receive = { - case Join(address) ⇒ node.joining(address) - case Leave(address) ⇒ //node.leaving(address) - case Down(address) ⇒ //node.downing(address) - case Remove(address) ⇒ //node.removing(address) - case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") + // start in JOINING + startWith(MemberStatus.Joining, Unit) + + // ======================== + // === IN JOINING === + when(MemberStatus.Joining) { + case Event(LeaderAction.Up, _) ⇒ + node.up() + goto(MemberStatus.Up) + } + + // ======================== + // === IN UP === + when(MemberStatus.Up) { + case Event(UserAction.Down, _) ⇒ + node.downing() + goto(MemberStatus.Down) + + case Event(UserAction.Leave, _) ⇒ + node.leaving() + goto(MemberStatus.Leaving) + + case Event(UserAction.Exit, _) ⇒ + node.exiting() + goto(MemberStatus.Exiting) + + case Event(LeaderAction.Remove, _) ⇒ + node.removing() + goto(MemberStatus.Removed) + } + + // ======================== + // === IN LEAVING === + when(MemberStatus.Leaving) { + case Event(UserAction.Down, _) ⇒ + node.downing() + goto(MemberStatus.Down) + + case Event(LeaderAction.Remove, _) ⇒ + node.removing() + goto(MemberStatus.Removed) + } + + // ======================== + // === IN EXITING === + when(MemberStatus.Exiting) { + case Event(LeaderAction.Remove, _) ⇒ + node.removing() + goto(MemberStatus.Removed) + } + + // ======================== + // === IN DOWN === + when(MemberStatus.Down) { + // FIXME How to transition from DOWN => JOINING when node comes back online. Can't just listen to Gossip message since it is received be another actor. How to fix this? + case Event(LeaderAction.Remove, _) ⇒ + node.removing() + goto(MemberStatus.Removed) + } + + // ======================== + // === IN REMOVED === + when(MemberStatus.Removed) { + case command ⇒ + log.warning("Removed node [{}] received cluster command [{}]", system.name, command) + stay + } + + // ======================== + // === GENERIC AND UNHANDLED COMMANDS === + whenUnhandled { + // should be able to handle Join in any state + case Event(UserAction.Join(address), _) ⇒ + node.joining(address) + stay + + case Event(command, _) ⇒ { + log.warning("Unhandled command [{}] in state [{}]", command, stateName) + stay + } } } @@ -274,7 +371,7 @@ class Node(system: ActorSystemImpl) extends Extension { log.info("Node [{}] - Starting cluster Node...", remoteAddress) // try to join the node defined in the 'akka.cluster.node-to-join' option - join() + autoJoin() // start periodic gossip to random nodes in cluster private val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { @@ -369,6 +466,7 @@ class Node(system: ActorSystemImpl) extends Extension { // ======================================================== /** + * State transition to JOINING. * New node joining. */ @tailrec @@ -397,6 +495,31 @@ class Node(system: ActorSystemImpl) extends Extension { } } + /** + * State transition to UP. + */ + private[cluster] final def up() {} + + /** + * State transition to LEAVING. + */ + private[cluster] final def leaving() {} + + /** + * State transition to EXITING. + */ + private[cluster] final def exiting() {} + + /** + * State transition to REMOVED. + */ + private[cluster] final def removing() {} + + /** + * State transition to DOWN. + */ + private[cluster] final def downing() {} + /** * Receive new gossip. */ @@ -444,9 +567,9 @@ class Node(system: ActorSystemImpl) extends Extension { /** * Joins the pre-configured contact point and retrieves current gossip state. */ - private def join() = nodeToJoin foreach { address ⇒ + private def autoJoin() = nodeToJoin foreach { address ⇒ val connection = clusterCommandConnectionFor(address) - val command = Join(remoteAddress) + val command = UserAction.Join(remoteAddress) log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, address, connection) connection ! command } @@ -501,16 +624,18 @@ class Node(system: ActorSystemImpl) extends Extension { } /** - * Switches the state in the FSM. + * Switches the member status. + * + * @param newStatus the new member status + * @param oldState the state to change the member status in + * @return the updated new state with the new member status */ - @tailrec - final private def switchStatusTo(newStatus: MemberStatus) { + private def switchMemberStatusTo(newStatus: MemberStatus, state: State): State = { log.info("Node [{}] - Switching membership status to [{}]", remoteAddress, newStatus) - val localState = state.get - val localSelf = localState.self + val localSelf = state.self - val localGossip = localState.latestGossip + val localGossip = state.latestGossip val localMembers = localGossip.members val newSelf = localSelf copy (status = newStatus) @@ -526,9 +651,7 @@ class Node(system: ActorSystemImpl) extends Extension { val versionedGossip = newGossip + vclockNode val seenVersionedGossip = versionedGossip seen remoteAddress - val newState = localState copy (self = newSelf, latestGossip = seenVersionedGossip) - - if (!state.compareAndSet(localState, newState)) switchStatusTo(newStatus) // recur if we failed update + state copy (self = newSelf, latestGossip = seenVersionedGossip) } /** diff --git a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala index e867bc834b..275cd32c75 100644 --- a/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/AccrualFailureDetectorSpec.scala @@ -1,3 +1,7 @@ +/** + * Copyright (C) 2009-2012 Typesafe Inc. + */ + package akka.cluster import java.net.InetSocketAddress diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala new file mode 100644 index 0000000000..69512f3ad9 --- /dev/null +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala @@ -0,0 +1,156 @@ +/** + * Copyright (C) 2009-2012 Typesafe Inc. + */ + +package akka.cluster + +import akka.testkit._ +import akka.actor.Address + +class ClusterCommandDaemonFSMSpec extends AkkaSpec( + """ + akka { + actor { + provider = akka.remote.RemoteActorRefProvider + } + } + """) with ImplicitSender { + + "A ClusterCommandDaemon FSM" must { + + "start in Joining" in { + val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + fsm.stateName must be(MemberStatus.Joining) + } + + "be able to switch from Joining to Up" in { + val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + fsm.stateName must be(MemberStatus.Joining) + fsm ! LeaderAction.Up + fsm.stateName must be(MemberStatus.Up) + } + + "be able to switch from Up to Down" in { + val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + fsm.stateName must be(MemberStatus.Joining) + fsm ! LeaderAction.Up + fsm.stateName must be(MemberStatus.Up) + fsm ! UserAction.Down + fsm.stateName must be(MemberStatus.Down) + } + + "be able to switch from Up to Leaving" in { + val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + fsm.stateName must be(MemberStatus.Joining) + fsm ! LeaderAction.Up + fsm.stateName must be(MemberStatus.Up) + fsm ! UserAction.Leave + fsm.stateName must be(MemberStatus.Leaving) + } + + "be able to switch from Up to Exiting" in { + val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + fsm.stateName must be(MemberStatus.Joining) + fsm ! LeaderAction.Up + fsm.stateName must be(MemberStatus.Up) + fsm ! UserAction.Exit + fsm.stateName must be(MemberStatus.Exiting) + } + + "be able to switch from Up to Removed" in { + val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + fsm.stateName must be(MemberStatus.Joining) + fsm ! LeaderAction.Up + fsm.stateName must be(MemberStatus.Up) + fsm ! LeaderAction.Remove + fsm.stateName must be(MemberStatus.Removed) + } + + "be able to switch from Leaving to Down" in { + val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + fsm.stateName must be(MemberStatus.Joining) + fsm ! LeaderAction.Up + fsm.stateName must be(MemberStatus.Up) + fsm ! UserAction.Leave + fsm.stateName must be(MemberStatus.Leaving) + fsm ! UserAction.Down + fsm.stateName must be(MemberStatus.Down) + } + + "be able to switch from Leaving to Removed" in { + val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + fsm.stateName must be(MemberStatus.Joining) + fsm ! LeaderAction.Up + fsm.stateName must be(MemberStatus.Up) + fsm ! UserAction.Leave + fsm.stateName must be(MemberStatus.Leaving) + fsm ! LeaderAction.Remove + fsm.stateName must be(MemberStatus.Removed) + } + + "be able to switch from Exiting to Removed" in { + val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + fsm.stateName must be(MemberStatus.Joining) + fsm ! LeaderAction.Up + fsm.stateName must be(MemberStatus.Up) + fsm ! UserAction.Exit + fsm.stateName must be(MemberStatus.Exiting) + fsm ! LeaderAction.Remove + fsm.stateName must be(MemberStatus.Removed) + } + + "be able to switch from Down to Removed" in { + val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + fsm.stateName must be(MemberStatus.Joining) + fsm ! LeaderAction.Up + fsm.stateName must be(MemberStatus.Up) + fsm ! UserAction.Down + fsm.stateName must be(MemberStatus.Down) + fsm ! LeaderAction.Remove + fsm.stateName must be(MemberStatus.Removed) + } + + "not be able to switch from Removed to any other state" in { + val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + fsm.stateName must be(MemberStatus.Joining) + fsm ! LeaderAction.Up + fsm.stateName must be(MemberStatus.Up) + fsm ! LeaderAction.Remove + fsm.stateName must be(MemberStatus.Removed) + fsm ! LeaderAction.Up + fsm.stateName must be(MemberStatus.Removed) + fsm ! UserAction.Leave + fsm.stateName must be(MemberStatus.Removed) + fsm ! UserAction.Down + fsm.stateName must be(MemberStatus.Removed) + fsm ! UserAction.Exit + fsm.stateName must be(MemberStatus.Removed) + fsm ! LeaderAction.Remove + fsm.stateName must be(MemberStatus.Removed) + } + + "remain in the same state when receiving a Join command" in { + val address = Address("akka", system.name) + + val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + fsm.stateName must be(MemberStatus.Joining) + fsm ! UserAction.Join(address) + fsm.stateName must be(MemberStatus.Joining) + + fsm ! LeaderAction.Up + fsm.stateName must be(MemberStatus.Up) + fsm ! UserAction.Join(address) + fsm.stateName must be(MemberStatus.Up) + + fsm ! UserAction.Leave + fsm.stateName must be(MemberStatus.Leaving) + fsm ! UserAction.Join(address) + fsm.stateName must be(MemberStatus.Leaving) + + fsm ! UserAction.Down + fsm.stateName must be(MemberStatus.Down) + fsm ! UserAction.Join(address) + fsm.stateName must be(MemberStatus.Down) + } + } +} diff --git a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala index 85587f8780..a9a42ef26c 100644 --- a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala @@ -1,6 +1,7 @@ /** * Copyright (C) 2009-2012 Typesafe Inc. */ + package akka.cluster import akka.testkit._ diff --git a/akka-cluster/src/test/scala/akka/cluster/VectorClockSpec.scala b/akka-cluster/src/test/scala/akka/cluster/VectorClockSpec.scala index 65f2aa1d75..d0e4c8da13 100644 --- a/akka-cluster/src/test/scala/akka/cluster/VectorClockSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/VectorClockSpec.scala @@ -1,3 +1,7 @@ +/** + * Copyright (C) 2009-2012 Typesafe Inc. + */ + package akka.cluster import java.net.InetSocketAddress diff --git a/akka-docs/scala/fsm.rst b/akka-docs/scala/fsm.rst index f0f2758e89..e44d9cc132 100644 --- a/akka-docs/scala/fsm.rst +++ b/akka-docs/scala/fsm.rst @@ -157,7 +157,7 @@ Defining States A state is defined by one or more invocations of the method :func:`when([, stateTimeout = ])(stateFunction)`. - + The given name must be an object which is type-compatible with the first type parameter given to the :class:`FSM` trait. This object is used as a hash key, so you must ensure that it properly implements :meth:`equals` and @@ -440,7 +440,7 @@ and in the following. Event Tracing ------------- -The setting ``akka.actor.debug.fsm`` in `:ref:`configuration` enables logging of an +The setting ``akka.actor.debug.fsm`` in :ref:`configuration` enables logging of an event trace by :class:`LoggingFSM` instances:: class MyFSM extends Actor with LoggingFSM[X, Z] { From 14d7632771778578aeb59afcef0dcb07b7e5b973 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 29 Feb 2012 10:02:00 +0100 Subject: [PATCH 63/72] Cleaned up failure detector fixing minor issues after review. Renamed internal classes in Node. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../akka/cluster/AccrualFailureDetector.scala | 18 ++- .../src/main/scala/akka/cluster/Node.scala | 136 ++++++++++++------ .../cluster/ClusterCommandDaemonFSMSpec.scala | 70 ++++----- 3 files changed, 141 insertions(+), 83 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala index e0d7cae052..d2dce19a80 100644 --- a/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala +++ b/akka-cluster/src/main/scala/akka/cluster/AccrualFailureDetector.scala @@ -88,8 +88,10 @@ class AccrualFailureDetector(system: ActorSystem, address: Address, val threshol val newTimestamps = oldTimestamps + (connection -> timestamp) // record new timestamp - var newIntervalsForConnection = - oldState.intervalHistory.get(connection).getOrElse(Vector.empty[Long]) :+ interval // append the new interval to history + var newIntervalsForConnection = (oldState.intervalHistory.get(connection) match { + case Some(history) ⇒ history + case _ ⇒ Vector.empty[Long] + }) :+ interval if (newIntervalsForConnection.size > maxSampleSize) { // reached max history, drop first interval @@ -100,7 +102,11 @@ class AccrualFailureDetector(system: ActorSystem, address: Address, val threshol if (newIntervalsForConnection.size > 1) { val newMean: Double = newIntervalsForConnection.sum / newIntervalsForConnection.size.toDouble - val oldConnectionFailureStats = oldFailureStats.get(connection).getOrElse(throw new IllegalStateException("Can't calculate new failure statistics due to missing heartbeat history")) + + val oldConnectionFailureStats = oldState.failureStats.get(connection) match { + case Some(stats) ⇒ stats + case _ ⇒ throw new IllegalStateException("Can't calculate new failure statistics due to missing heartbeat history") + } val deviationSum = newIntervalsForConnection @@ -148,8 +154,10 @@ class AccrualFailureDetector(system: ActorSystem, address: Address, val threshol else { val timestampDiff = newTimestamp - oldTimestamp.get - val stats = oldState.failureStats.get(connection) - val mean = stats.getOrElse(throw new IllegalStateException("Can't calculate Failure Detector Phi value for a node that have no heartbeat history")).mean + val mean = oldState.failureStats.get(connection) match { + case Some(FailureStats(mean, _, _)) ⇒ mean + case _ ⇒ throw new IllegalStateException("Can't calculate Failure Detector Phi value for a node that have no heartbeat history") + } if (mean == 0.0D) 0.0D else PhiFactor * timestampDiff / mean diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index cea128d027..eb4142b066 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -48,7 +48,7 @@ sealed trait ClusterMessage extends Serializable /** * Cluster commands sent by the USER. */ -object UserAction { +object ClusterAction { /** * Command to join the cluster. Sent when a node (reprsesented by 'address') @@ -56,6 +56,11 @@ object UserAction { */ case class Join(address: Address) extends ClusterMessage + /** + * Command to set a node to Up (from Joining). + */ + case object Up extends ClusterMessage + /** * Command to leave the cluster. */ @@ -70,18 +75,6 @@ object UserAction { * Command to mark a node to be removed from the cluster immediately. */ case object Exit extends ClusterMessage -} - -/** - * Cluster commands sent by the LEADER. - * Node: Leader can also send UserActions but not vice versa. - */ -object LeaderAction { - - /** - * Command to set a node to Up (from Joining). - */ - case object Up extends ClusterMessage /** * Command to remove a node from the cluster immediately. @@ -190,7 +183,7 @@ final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor // ======================== // === IN JOINING === when(MemberStatus.Joining) { - case Event(LeaderAction.Up, _) ⇒ + case Event(ClusterAction.Up, _) ⇒ node.up() goto(MemberStatus.Up) } @@ -198,19 +191,19 @@ final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor // ======================== // === IN UP === when(MemberStatus.Up) { - case Event(UserAction.Down, _) ⇒ + case Event(ClusterAction.Down, _) ⇒ node.downing() goto(MemberStatus.Down) - case Event(UserAction.Leave, _) ⇒ + case Event(ClusterAction.Leave, _) ⇒ node.leaving() goto(MemberStatus.Leaving) - case Event(UserAction.Exit, _) ⇒ + case Event(ClusterAction.Exit, _) ⇒ node.exiting() goto(MemberStatus.Exiting) - case Event(LeaderAction.Remove, _) ⇒ + case Event(ClusterAction.Remove, _) ⇒ node.removing() goto(MemberStatus.Removed) } @@ -218,11 +211,11 @@ final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor // ======================== // === IN LEAVING === when(MemberStatus.Leaving) { - case Event(UserAction.Down, _) ⇒ + case Event(ClusterAction.Down, _) ⇒ node.downing() goto(MemberStatus.Down) - case Event(LeaderAction.Remove, _) ⇒ + case Event(ClusterAction.Remove, _) ⇒ node.removing() goto(MemberStatus.Removed) } @@ -230,7 +223,7 @@ final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor // ======================== // === IN EXITING === when(MemberStatus.Exiting) { - case Event(LeaderAction.Remove, _) ⇒ + case Event(ClusterAction.Remove, _) ⇒ node.removing() goto(MemberStatus.Removed) } @@ -239,7 +232,7 @@ final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor // === IN DOWN === when(MemberStatus.Down) { // FIXME How to transition from DOWN => JOINING when node comes back online. Can't just listen to Gossip message since it is received be another actor. How to fix this? - case Event(LeaderAction.Remove, _) ⇒ + case Event(ClusterAction.Remove, _) ⇒ node.removing() goto(MemberStatus.Removed) } @@ -248,7 +241,7 @@ final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor // === IN REMOVED === when(MemberStatus.Removed) { case command ⇒ - log.warning("Removed node [{}] received cluster command [{}]", system.name, command) + log.warning("Removed node [{}] received cluster command [{}]", system.name, command) stay } @@ -256,7 +249,7 @@ final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor // === GENERIC AND UNHANDLED COMMANDS === whenUnhandled { // should be able to handle Join in any state - case Event(UserAction.Join(address), _) ⇒ + case Event(ClusterAction.Join(address), _) ⇒ node.joining(address) stay @@ -288,6 +281,15 @@ final class ClusterGossipDaemon(system: ActorSystem, node: Node) extends Actor { * * if (node.isLeader) { ... } * }}} + * + * Example: + * {{{ + * import akka.cluster._ + * + * val node = system.node // implicit conversion adds 'node' method + * + * if (node.isLeader) { ... } + * }}} */ object NodeExtension extends ExtensionId[Node] with ExtensionIdProvider { override def get(system: ActorSystem): Node = super.get(system) @@ -311,14 +313,25 @@ object NodeExtension extends ExtensionId[Node] with ExtensionIdProvider { * 3) If the member gossiped to at (1) was not deputy, or the number of live members is less than number of deputy list, * gossip to random deputy with certain probability depending on number of unreachable, deputy and live members. * + * + * Example: + * {{{ + * val node = NodeExtension(system) + * + * if (node.isLeader) { ... } + * }}} + * + * Example: + * {{{ + * import akka.cluster._ + * + * val node = system.node // implicit conversion adds 'node' method + * + * if (node.isLeader) { ... } + * }}} */ class Node(system: ActorSystemImpl) extends Extension { - if (!system.provider.isInstanceOf[RemoteActorRefProvider]) - throw new ConfigurationException("ActorSystem[" + system + "] needs to have a 'RemoteActorRefProvider' enabled in the configuration") - - val remote: RemoteActorRefProvider = system.provider.asInstanceOf[RemoteActorRefProvider] - /** * Represents the state for this Node. Implemented using optimistic lockless concurrency, * all state is represented by this immutable case class and managed by an AtomicReference. @@ -328,18 +341,23 @@ class Node(system: ActorSystemImpl) extends Extension { latestGossip: Gossip, memberMembershipChangeListeners: Set[MembershipChangeListener] = Set.empty[MembershipChangeListener]) - val remoteSettings = new RemoteSettings(system.settings.config, system.name) - val clusterSettings = new ClusterSettings(system.settings.config, system.name) + if (!system.provider.isInstanceOf[RemoteActorRefProvider]) + throw new ConfigurationException("ActorSystem[" + system + "] needs to have a 'RemoteActorRefProvider' enabled in the configuration") - val remoteAddress = remote.transport.address - val vclockNode = VectorClock.Node(remoteAddress.toString) + private val remote: RemoteActorRefProvider = system.provider.asInstanceOf[RemoteActorRefProvider] - val gossipInitialDelay = clusterSettings.GossipInitialDelay - val gossipFrequency = clusterSettings.GossipFrequency + private val remoteSettings = new RemoteSettings(system.settings.config, system.name) + private val clusterSettings = new ClusterSettings(system.settings.config, system.name) - implicit val memberOrdering = Ordering.fromLessThan[Member](_.address.toString < _.address.toString) + private val remoteAddress = remote.transport.address + private val vclockNode = VectorClock.Node(remoteAddress.toString) - implicit val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) + private val gossipInitialDelay = clusterSettings.GossipInitialDelay + private val gossipFrequency = clusterSettings.GossipFrequency + + implicit private val memberOrdering = Ordering.fromLessThan[Member](_.address.toString < _.address.toString) + + implicit private val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) val failureDetector = new AccrualFailureDetector( system, remoteAddress, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) @@ -461,6 +479,34 @@ class Node(system: ActorSystemImpl) extends Extension { if (!state.compareAndSet(localState, newState)) unregisterListener(listener) // recur } + /** + * Send command to JOIN one node to another. + */ + def sendJoin(address: Address) { + clusterCommandDaemon ! ClusterAction.Join(address) + } + + /** + * Send command to issue state transition to LEAVING. + */ + def sendLeave() { + clusterCommandDaemon ! ClusterAction.Leave + } + + /** + * Send command to issue state transition to EXITING. + */ + def sendDown() { + clusterCommandDaemon ! ClusterAction.Down + } + + /** + * Send command to issue state transition to REMOVED. + */ + def sendRemove() { + clusterCommandDaemon ! ClusterAction.Remove + } + // ======================================================== // ===================== INTERNAL API ===================== // ======================================================== @@ -569,7 +615,7 @@ class Node(system: ActorSystemImpl) extends Extension { */ private def autoJoin() = nodeToJoin foreach { address ⇒ val connection = clusterCommandConnectionFor(address) - val command = UserAction.Join(remoteAddress) + val command = ClusterAction.Join(remoteAddress) log.info("Node [{}] - Sending [{}] to [{}] through connection [{}]", remoteAddress, command, address, connection) connection ! command } @@ -727,11 +773,15 @@ class Node(system: ActorSystemImpl) extends Extension { * @returns Some(convergedGossip) if convergence have been reached and None if not */ private def convergence(gossip: Gossip): Option[Gossip] = { - val seen = gossip.overview.seen - val views = Set.empty[VectorClock] ++ seen.values - if (views.size == 1) { - log.debug("Node [{}] - Cluster convergence reached", remoteAddress) - Some(gossip) + val overview = gossip.overview + if (overview.unreachable.isEmpty) { // if there are any unreachable nodes then we can't have a convergence - + // waiting for user to act (issuing DOWN) or leader to act (issuing DOWN through auto-down) + val seen = gossip.overview.seen + val views = Set.empty[VectorClock] ++ seen.values + if (views.size == 1) { + log.debug("Node [{}] - Cluster convergence reached", remoteAddress) + Some(gossip) + } else None } else None } diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala index 69512f3ad9..1cda6ff45b 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala @@ -26,106 +26,106 @@ class ClusterCommandDaemonFSMSpec extends AkkaSpec( "be able to switch from Joining to Up" in { val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) fsm.stateName must be(MemberStatus.Joining) - fsm ! LeaderAction.Up + fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) } "be able to switch from Up to Down" in { val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) fsm.stateName must be(MemberStatus.Joining) - fsm ! LeaderAction.Up + fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) - fsm ! UserAction.Down + fsm ! ClusterAction.Down fsm.stateName must be(MemberStatus.Down) } "be able to switch from Up to Leaving" in { val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) fsm.stateName must be(MemberStatus.Joining) - fsm ! LeaderAction.Up + fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) - fsm ! UserAction.Leave + fsm ! ClusterAction.Leave fsm.stateName must be(MemberStatus.Leaving) } "be able to switch from Up to Exiting" in { val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) fsm.stateName must be(MemberStatus.Joining) - fsm ! LeaderAction.Up + fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) - fsm ! UserAction.Exit + fsm ! ClusterAction.Exit fsm.stateName must be(MemberStatus.Exiting) } "be able to switch from Up to Removed" in { val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) fsm.stateName must be(MemberStatus.Joining) - fsm ! LeaderAction.Up + fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) - fsm ! LeaderAction.Remove + fsm ! ClusterAction.Remove fsm.stateName must be(MemberStatus.Removed) } "be able to switch from Leaving to Down" in { val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) fsm.stateName must be(MemberStatus.Joining) - fsm ! LeaderAction.Up + fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) - fsm ! UserAction.Leave + fsm ! ClusterAction.Leave fsm.stateName must be(MemberStatus.Leaving) - fsm ! UserAction.Down + fsm ! ClusterAction.Down fsm.stateName must be(MemberStatus.Down) } "be able to switch from Leaving to Removed" in { val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) fsm.stateName must be(MemberStatus.Joining) - fsm ! LeaderAction.Up + fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) - fsm ! UserAction.Leave + fsm ! ClusterAction.Leave fsm.stateName must be(MemberStatus.Leaving) - fsm ! LeaderAction.Remove + fsm ! ClusterAction.Remove fsm.stateName must be(MemberStatus.Removed) } "be able to switch from Exiting to Removed" in { val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) fsm.stateName must be(MemberStatus.Joining) - fsm ! LeaderAction.Up + fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) - fsm ! UserAction.Exit + fsm ! ClusterAction.Exit fsm.stateName must be(MemberStatus.Exiting) - fsm ! LeaderAction.Remove + fsm ! ClusterAction.Remove fsm.stateName must be(MemberStatus.Removed) } "be able to switch from Down to Removed" in { val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) fsm.stateName must be(MemberStatus.Joining) - fsm ! LeaderAction.Up + fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) - fsm ! UserAction.Down + fsm ! ClusterAction.Down fsm.stateName must be(MemberStatus.Down) - fsm ! LeaderAction.Remove + fsm ! ClusterAction.Remove fsm.stateName must be(MemberStatus.Removed) } "not be able to switch from Removed to any other state" in { val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) fsm.stateName must be(MemberStatus.Joining) - fsm ! LeaderAction.Up + fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) - fsm ! LeaderAction.Remove + fsm ! ClusterAction.Remove fsm.stateName must be(MemberStatus.Removed) - fsm ! LeaderAction.Up + fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Removed) - fsm ! UserAction.Leave + fsm ! ClusterAction.Leave fsm.stateName must be(MemberStatus.Removed) - fsm ! UserAction.Down + fsm ! ClusterAction.Down fsm.stateName must be(MemberStatus.Removed) - fsm ! UserAction.Exit + fsm ! ClusterAction.Exit fsm.stateName must be(MemberStatus.Removed) - fsm ! LeaderAction.Remove + fsm ! ClusterAction.Remove fsm.stateName must be(MemberStatus.Removed) } @@ -134,22 +134,22 @@ class ClusterCommandDaemonFSMSpec extends AkkaSpec( val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) fsm.stateName must be(MemberStatus.Joining) - fsm ! UserAction.Join(address) + fsm ! ClusterAction.Join(address) fsm.stateName must be(MemberStatus.Joining) - fsm ! LeaderAction.Up + fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) - fsm ! UserAction.Join(address) + fsm ! ClusterAction.Join(address) fsm.stateName must be(MemberStatus.Up) - fsm ! UserAction.Leave + fsm ! ClusterAction.Leave fsm.stateName must be(MemberStatus.Leaving) - fsm ! UserAction.Join(address) + fsm ! ClusterAction.Join(address) fsm.stateName must be(MemberStatus.Leaving) - fsm ! UserAction.Down + fsm ! ClusterAction.Down fsm.stateName must be(MemberStatus.Down) - fsm ! UserAction.Join(address) + fsm ! ClusterAction.Join(address) fsm.stateName must be(MemberStatus.Down) } } From 06ec519c7c8c8d0f6e139eb5b5417b9012ed8742 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 29 Feb 2012 11:34:11 +0100 Subject: [PATCH 64/72] Reverted two lines of code mistakenly pushed to early. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/Node.scala | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index eb4142b066..055d7b3b7f 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -774,15 +774,15 @@ class Node(system: ActorSystemImpl) extends Extension { */ private def convergence(gossip: Gossip): Option[Gossip] = { val overview = gossip.overview - if (overview.unreachable.isEmpty) { // if there are any unreachable nodes then we can't have a convergence - - // waiting for user to act (issuing DOWN) or leader to act (issuing DOWN through auto-down) - val seen = gossip.overview.seen - val views = Set.empty[VectorClock] ++ seen.values - if (views.size == 1) { - log.debug("Node [{}] - Cluster convergence reached", remoteAddress) - Some(gossip) - } else None + // if (overview.unreachable.isEmpty) { // if there are any unreachable nodes then we can't have a convergence - + // waiting for user to act (issuing DOWN) or leader to act (issuing DOWN through auto-down) + val seen = gossip.overview.seen + val views = Set.empty[VectorClock] ++ seen.values + if (views.size == 1) { + log.debug("Node [{}] - Cluster convergence reached", remoteAddress) + Some(gossip) } else None + // } else None } /** From a3026b3316dc5b34c3d37ce6fc56cc44bac1d561 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Fri, 2 Mar 2012 09:55:54 +0100 Subject: [PATCH 65/72] Fixed misc issues after review. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/Node.scala | 115 +++++++----------- .../main/scala/akka/cluster/VectorClock.scala | 21 +--- .../src/main/scala/akka/package.scala | 20 --- .../cluster/ClusterCommandDaemonFSMSpec.scala | 24 ++-- .../GossipingAccrualFailureDetectorSpec.scala | 31 ++--- .../akka/cluster/LeaderElectionSpec.scala | 10 +- .../MembershipChangeListenerSpec.scala | 29 ++--- .../akka/cluster/NodeMembershipSpec.scala | 29 ++--- .../scala/akka/cluster/NodeStartupSpec.scala | 31 ++--- .../scala/akka/remote/RemoteTransport.scala | 13 -- project/AkkaBuild.scala | 4 +- 11 files changed, 101 insertions(+), 226 deletions(-) delete mode 100644 akka-cluster/src/main/scala/akka/package.scala diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index 055d7b3b7f..dfa5e6efd7 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -107,18 +107,6 @@ object MemberStatus { case object Removed extends MemberStatus } -// sealed trait PartitioningStatus -// object PartitioningStatus { -// case object Complete extends PartitioningStatus -// case object Awaiting extends PartitioningStatus -// } - -// case class PartitioningChange( -// from: Address, -// to: Address, -// path: PartitionPath, -// status: PartitioningStatus) - /** * Represents the overview of the cluster, holds the cluster convergence table and set with unreachable nodes. */ @@ -138,8 +126,6 @@ case class GossipOverview( case class Gossip( overview: GossipOverview = GossipOverview(), members: SortedSet[Member], // sorted set of members with their status, sorted by name - //partitions: Tree[PartitionPath, Node] = Tree.empty[PartitionPath, Node], // name/partition service - //pending: Set[PartitioningChange] = Set.empty[PartitioningChange], meta: Map[String, Array[Byte]] = Map.empty[String, Array[Byte]], version: VectorClock = VectorClock()) // vector clock version extends ClusterMessage // is a serializable cluster message @@ -159,8 +145,10 @@ case class Gossip( * Marks the gossip as seen by this node (remoteAddress) by updating the address entry in the 'gossip.overview.seen' * Map with the VectorClock for the new gossip. */ - def seen(address: Address): Gossip = - this copy (overview = overview copy (seen = overview.seen + (address -> version))) + def seen(address: Address): Gossip = { + if (overview.seen.contains(address) && overview.seen(address) == version) this + else this copy (overview = overview copy (seen = overview.seen + (address -> version))) + } override def toString = "Gossip(" + @@ -269,34 +257,26 @@ final class ClusterGossipDaemon(system: ActorSystem, node: Node) extends Actor { def receive = { case GossipEnvelope(sender, gossip) ⇒ node.receive(sender, gossip) - case unknown ⇒ log.error("Unknown message sent to cluster daemon [" + unknown + "]") } + + override def unhandled(unknown: Any) = log.error("Unknown message sent to cluster daemon [" + unknown + "]") } /** * Node Extension Id and factory for creating Node extension. * Example: * {{{ - * val node = NodeExtension(system) - * - * if (node.isLeader) { ... } - * }}} - * - * Example: - * {{{ - * import akka.cluster._ - * - * val node = system.node // implicit conversion adds 'node' method + * val node = Node(system) * * if (node.isLeader) { ... } * }}} */ -object NodeExtension extends ExtensionId[Node] with ExtensionIdProvider { +object Node extends ExtensionId[Node] with ExtensionIdProvider { override def get(system: ActorSystem): Node = super.get(system) - override def lookup = NodeExtension + override def lookup = Node - override def createExtension(system: ExtendedActorSystem): Node = new Node(system.asInstanceOf[ActorSystemImpl]) // not nice but need API in ActorSystemImpl inside Node + override def createExtension(system: ExtendedActorSystem): Node = new Node(system) } /** @@ -316,21 +296,12 @@ object NodeExtension extends ExtensionId[Node] with ExtensionIdProvider { * * Example: * {{{ - * val node = NodeExtension(system) - * - * if (node.isLeader) { ... } - * }}} - * - * Example: - * {{{ - * import akka.cluster._ - * - * val node = system.node // implicit conversion adds 'node' method + * val node = Node(system) * * if (node.isLeader) { ... } * }}} */ -class Node(system: ActorSystemImpl) extends Extension { +class Node(system: ExtendedActorSystem) extends Extension { /** * Represents the state for this Node. Implemented using optimistic lockless concurrency, @@ -372,10 +343,10 @@ class Node(system: ActorSystemImpl) extends Extension { private val log = Logging(system, "Node") private val random = SecureRandom.getInstance("SHA1PRNG") - private val clusterCommandDaemon = system.systemActorOf( + private val clusterCommandDaemon = systemActorOf( Props(new ClusterCommandDaemon(system, this)), "clusterCommand") - private val clusterGossipDaemon = system.systemActorOf( + private val clusterGossipDaemon = systemActorOf( Props(new ClusterGossipDaemon(system, this)).withRouter(RoundRobinRouter(nrOfGossipDaemons)), "clusterGossip") private val state = { @@ -439,21 +410,13 @@ class Node(system: ActorSystemImpl) extends Extension { * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. */ def shutdown() { - // FIXME Cheating for now. Can't just shut down. Node must first gossip an Leave command, wait for Leader to do proper Handoff and then await an Exit command before switching to Removed - if (isRunning.compareAndSet(true, false)) { log.info("Node [{}] - Shutting down Node and ClusterDaemon...", remoteAddress) - - try system.stop(clusterCommandDaemon) finally { - try system.stop(clusterGossipDaemon) finally { - try gossipCanceller.cancel() finally { - try scrutinizeCanceller.cancel() finally { - log.info("Node [{}] - Node and ClusterDaemon shut down successfully", remoteAddress) - } - } - } - } + gossipCanceller.cancel() + scrutinizeCanceller.cancel() + system.stop(clusterCommandDaemon) + system.stop(clusterGossipDaemon) } } @@ -519,8 +482,6 @@ class Node(system: ActorSystemImpl) extends Extension { private[cluster] final def joining(node: Address) { log.info("Node [{}] - Node [{}] is joining", remoteAddress, node) - failureDetector heartbeat node // update heartbeat in failure detector - val localState = state.get val localGossip = localState.latestGossip val localMembers = localGossip.members @@ -535,8 +496,9 @@ class Node(system: ActorSystemImpl) extends Extension { if (!state.compareAndSet(localState, newState)) joining(node) // recur if we failed update else { + failureDetector heartbeat node // update heartbeat in failure detector if (convergence(newState.latestGossip).isDefined) { - newState.memberMembershipChangeListeners map { _ notify newMembers } + newState.memberMembershipChangeListeners foreach { _ notify newMembers } } } } @@ -571,10 +533,6 @@ class Node(system: ActorSystemImpl) extends Extension { */ @tailrec private[cluster] final def receive(sender: Member, remoteGossip: Gossip) { - log.debug("Node [{}] - Receiving gossip from [{}]", remoteAddress, sender.address) - - failureDetector heartbeat sender.address // update heartbeat in failure detector - val localState = state.get val localGossip = localState.latestGossip @@ -604,8 +562,12 @@ class Node(system: ActorSystemImpl) extends Extension { // if we won the race then update else try again if (!state.compareAndSet(localState, newState)) receive(sender, remoteGossip) // recur if we fail the update else { + log.debug("Node [{}] - Receiving gossip from [{}]", remoteAddress, sender.address) + + failureDetector heartbeat sender.address // update heartbeat in failure detector + if (convergence(newState.latestGossip).isDefined) { - newState.memberMembershipChangeListeners map { _ notify newState.latestGossip.members } + newState.memberMembershipChangeListeners foreach { _ notify newState.latestGossip.members } } } } @@ -639,12 +601,12 @@ class Node(system: ActorSystemImpl) extends Extension { val localUnreachableSize = localUnreachableAddresses.size // 1. gossip to alive members - val gossipedToDeputy = gossipToRandomNodeOf(localMembers.toList map { _.address }) + val gossipedToDeputy = gossipToRandomNodeOf(localMembers map { _.address }) // 2. gossip to unreachable members if (localUnreachableSize > 0) { val probability: Double = localUnreachableSize / (localMembersSize + 1) - if (random.nextDouble() < probability) gossipToRandomNodeOf(localUnreachableAddresses.toList) + if (random.nextDouble() < probability) gossipToRandomNodeOf(localUnreachableAddresses) } // 3. gossip to a deputy nodes for facilitating partition healing @@ -714,8 +676,8 @@ class Node(system: ActorSystemImpl) extends Extension { * * @return 'true' if it gossiped to a "deputy" member. */ - private def gossipToRandomNodeOf(addresses: Seq[Address]): Boolean = { - val peers = addresses filter (_ != remoteAddress) // filter out myself + private def gossipToRandomNodeOf(addresses: Iterable[Address]): Boolean = { + val peers = addresses filterNot (_ == remoteAddress) // filter out myself val peer = selectRandomNode(peers) gossipTo(peer) deputyNodes exists (peer == _) @@ -744,8 +706,6 @@ class Node(system: ActorSystemImpl) extends Extension { val newMembers = localMembers diff newlyDetectedUnreachableMembers val newUnreachableAddresses: Set[Address] = localUnreachableAddresses ++ newlyDetectedUnreachableAddresses - log.info("Node [{}] - Marking node(s) an unreachable [{}]", remoteAddress, newlyDetectedUnreachableAddresses.mkString(", ")) - val newSeen = newUnreachableAddresses.foldLeft(localSeen)((currentSeen, address) ⇒ currentSeen - address) val newOverview = localOverview copy (seen = newSeen, unreachable = newUnreachableAddresses) @@ -759,8 +719,10 @@ class Node(system: ActorSystemImpl) extends Extension { // if we won the race then update else try again if (!state.compareAndSet(localState, newState)) scrutinize() // recur else { + log.info("Node [{}] - Marking node(s) an unreachable [{}]", remoteAddress, newlyDetectedUnreachableAddresses.mkString(", ")) + if (convergence(newState.latestGossip).isDefined) { - newState.memberMembershipChangeListeners map { _ notify newMembers } + newState.memberMembershipChangeListeners foreach { _ notify newMembers } } } } @@ -777,7 +739,7 @@ class Node(system: ActorSystemImpl) extends Extension { // if (overview.unreachable.isEmpty) { // if there are any unreachable nodes then we can't have a convergence - // waiting for user to act (issuing DOWN) or leader to act (issuing DOWN through auto-down) val seen = gossip.overview.seen - val views = Set.empty[VectorClock] ++ seen.values + val views = seen.values.toSet if (views.size == 1) { log.debug("Node [{}] - Cluster convergence reached", remoteAddress) Some(gossip) @@ -785,6 +747,13 @@ class Node(system: ActorSystemImpl) extends Extension { // } else None } + private def systemActorOf(props: Props, name: String): ActorRef = { + Await.result(system.systemGuardian ? CreateChild(props, name), system.settings.CreationTimeout.duration) match { + case ref: ActorRef ⇒ ref + case ex: Exception ⇒ throw ex + } + } + /** * Sets up cluster command connection. */ @@ -795,9 +764,9 @@ class Node(system: ActorSystemImpl) extends Extension { */ private def clusterGossipConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "clusterGossip") - private def deputyNodes: Seq[Address] = state.get.latestGossip.members.toSeq map (_.address) drop 1 take nrOfDeputyNodes filter (_ != remoteAddress) + private def deputyNodes: Iterable[Address] = state.get.latestGossip.members.toIterable map (_.address) drop 1 take nrOfDeputyNodes filter (_ != remoteAddress) - private def selectRandomNode(addresses: Seq[Address]): Address = addresses(random nextInt addresses.size) + private def selectRandomNode(addresses: Iterable[Address]): Address = addresses.toSeq(random nextInt addresses.size) private def isSingletonCluster(currentState: State): Boolean = currentState.latestGossip.members.size == 1 } diff --git a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala index e27215b23c..512d29caad 100644 --- a/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala +++ b/akka-cluster/src/main/scala/akka/cluster/VectorClock.scala @@ -72,25 +72,18 @@ object VectorClock { /** * Hash representation of a versioned node name. */ - class Node private (val name: String) extends Serializable { - override def hashCode = 0 + name.## - - override def equals(other: Any) = Node.unapply(this) == Node.unapply(other) - - override def toString = name.mkString("Node(", "", ")") - } + sealed trait Node extends Serializable object Node { - def apply(name: String): Node = new Node(hash(name)) - - def unapply(other: Any) = other match { - case x: Node ⇒ import x._; Some(name) - case _ ⇒ None + private case class NodeImpl(name: String) extends Node { + override def toString(): String = "Node(" + name + ")" } + def apply(name: String): Node = NodeImpl(hash(name)) + private def hash(name: String): String = { val digester = MessageDigest.getInstance("MD5") - digester update name.getBytes + digester update name.getBytes("UTF-8") digester.digest.map { h ⇒ "%02x".format(0xFF & h) }.mkString } } @@ -144,8 +137,6 @@ case class VectorClock( versions: Map[VectorClock.Node, VectorClock.Timestamp] = Map.empty[VectorClock.Node, VectorClock.Timestamp]) extends PartiallyOrdered[VectorClock] { - // FIXME pruning of VectorClock history - import VectorClock._ /** diff --git a/akka-cluster/src/main/scala/akka/package.scala b/akka-cluster/src/main/scala/akka/package.scala deleted file mode 100644 index 993a12c64f..0000000000 --- a/akka-cluster/src/main/scala/akka/package.scala +++ /dev/null @@ -1,20 +0,0 @@ -/** - * Copyright (C) 2009-2012 Typesafe Inc. - */ - -package akka - -import akka.actor.ActorSystem - -package object cluster { - - /** - * Implicitly creates an augmented [[akka.actor.ActorSystem]] with a method {{{def node: Node}}}. - * - * @param system - * @return An augmented [[akka.actor.ActorSystem]] with a method {{{def node: Node}}}. - */ - implicit def actorSystemWithNodeAccessor(system: ActorSystem) = new { - val node = NodeExtension(system) - } -} diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala index 1cda6ff45b..b0b5c6dadf 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala @@ -19,19 +19,19 @@ class ClusterCommandDaemonFSMSpec extends AkkaSpec( "A ClusterCommandDaemon FSM" must { "start in Joining" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) fsm.stateName must be(MemberStatus.Joining) } "be able to switch from Joining to Up" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) fsm.stateName must be(MemberStatus.Joining) fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) } "be able to switch from Up to Down" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) fsm.stateName must be(MemberStatus.Joining) fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) @@ -40,7 +40,7 @@ class ClusterCommandDaemonFSMSpec extends AkkaSpec( } "be able to switch from Up to Leaving" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) fsm.stateName must be(MemberStatus.Joining) fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) @@ -49,7 +49,7 @@ class ClusterCommandDaemonFSMSpec extends AkkaSpec( } "be able to switch from Up to Exiting" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) fsm.stateName must be(MemberStatus.Joining) fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) @@ -58,7 +58,7 @@ class ClusterCommandDaemonFSMSpec extends AkkaSpec( } "be able to switch from Up to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) fsm.stateName must be(MemberStatus.Joining) fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) @@ -67,7 +67,7 @@ class ClusterCommandDaemonFSMSpec extends AkkaSpec( } "be able to switch from Leaving to Down" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) fsm.stateName must be(MemberStatus.Joining) fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) @@ -78,7 +78,7 @@ class ClusterCommandDaemonFSMSpec extends AkkaSpec( } "be able to switch from Leaving to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) fsm.stateName must be(MemberStatus.Joining) fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) @@ -89,7 +89,7 @@ class ClusterCommandDaemonFSMSpec extends AkkaSpec( } "be able to switch from Exiting to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) fsm.stateName must be(MemberStatus.Joining) fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) @@ -100,7 +100,7 @@ class ClusterCommandDaemonFSMSpec extends AkkaSpec( } "be able to switch from Down to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) fsm.stateName must be(MemberStatus.Joining) fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) @@ -111,7 +111,7 @@ class ClusterCommandDaemonFSMSpec extends AkkaSpec( } "not be able to switch from Removed to any other state" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) fsm.stateName must be(MemberStatus.Joining) fsm ! ClusterAction.Up fsm.stateName must be(MemberStatus.Up) @@ -132,7 +132,7 @@ class ClusterCommandDaemonFSMSpec extends AkkaSpec( "remain in the same state when receiving a Join command" in { val address = Address("akka", system.name) - val fsm = TestFSMRef(new ClusterCommandDaemon(system, system.node)) + val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) fsm.stateName must be(MemberStatus.Joining) fsm ! ClusterAction.Join(address) fsm.stateName must be(MemberStatus.Joining) diff --git a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala index f4deb77706..6c81f8680a 100644 --- a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala @@ -16,9 +16,11 @@ import java.net.InetSocketAddress class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" akka { loglevel = "INFO" - cluster.failure-detector.threshold = 3 actor.debug.lifecycle = on actor.debug.autoreceive = on + actor.provider = akka.remote.RemoteActorRefProvider + remote.netty.hostname = localhost + cluster.failure-detector.threshold = 3 } """) with ImplicitSender { @@ -35,18 +37,11 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" // ======= NODE 1 ======== system1 = ActorSystem("system1", ConfigFactory - .parseString(""" - akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5550 - } - }""") + .parseString("akka.remote.netty.port=5550") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = new Node(system1) + node1 = Node(system1) val fd1 = node1.failureDetector val address1 = node1.self.address @@ -54,17 +49,13 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" system2 = ActorSystem("system2", ConfigFactory .parseString(""" akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port = 5551 - } + remote.netty.port=5551 cluster.node-to-join = "akka://system1@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = new Node(system2) + node2 = Node(system2) val fd2 = node2.failureDetector val address2 = node2.self.address @@ -72,17 +63,13 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" system3 = ActorSystem("system3", ConfigFactory .parseString(""" akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5552 - } + remote.netty.port=5552 cluster.node-to-join = "akka://system1@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] - node3 = new Node(system3) + node3 = Node(system3) val fd3 = node3.failureDetector val address3 = node3.self.address diff --git a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala index a9a42ef26c..d27611cb79 100644 --- a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala @@ -17,6 +17,7 @@ import java.net.InetSocketAddress class LeaderElectionSpec extends AkkaSpec(""" akka { loglevel = "INFO" + actor.provider = "akka.remote.RemoteActorRefProvider" actor.debug.lifecycle = on actor.debug.autoreceive = on cluster.failure-detector.threshold = 3 @@ -38,7 +39,6 @@ class LeaderElectionSpec extends AkkaSpec(""" system1 = ActorSystem("system1", ConfigFactory .parseString(""" akka { - actor.provider = "akka.remote.RemoteActorRefProvider" remote.netty { hostname = localhost port=5550 @@ -47,7 +47,7 @@ class LeaderElectionSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = new Node(system1) + node1 = Node(system1) val fd1 = node1.failureDetector val address1 = node1.self.address @@ -55,7 +55,6 @@ class LeaderElectionSpec extends AkkaSpec(""" system2 = ActorSystem("system2", ConfigFactory .parseString(""" akka { - actor.provider = "akka.remote.RemoteActorRefProvider" remote.netty { hostname = localhost port = 5551 @@ -65,7 +64,7 @@ class LeaderElectionSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = new Node(system2) + node2 = Node(system2) val fd2 = node2.failureDetector val address2 = node2.self.address @@ -73,7 +72,6 @@ class LeaderElectionSpec extends AkkaSpec(""" system3 = ActorSystem("system3", ConfigFactory .parseString(""" akka { - actor.provider = "akka.remote.RemoteActorRefProvider" remote.netty { hostname = localhost port=5552 @@ -83,7 +81,7 @@ class LeaderElectionSpec extends AkkaSpec(""" .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] - node3 = new Node(system3) + node3 = Node(system3) val fd3 = node3.failureDetector val address3 = node3.self.address diff --git a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala index e2487de3c8..d43841d2ca 100644 --- a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala @@ -18,6 +18,8 @@ import com.typesafe.config._ class MembershipChangeListenerSpec extends AkkaSpec(""" akka { + actor.provider = akka.remote.RemoteActorRefProvider + remote.netty.hostname = localhost loglevel = "INFO" } """) with ImplicitSender { @@ -34,33 +36,22 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" "A set of connected cluster systems" must { "(when two systems) after cluster convergence updates the membership table then all MembershipChangeListeners should be triggered" taggedAs LongRunningTest in { system0 = ActorSystem("system0", ConfigFactory - .parseString(""" - akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5550 - } - }""") + .parseString("akka.remote.netty.port=5550") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] - node0 = new Node(system0) + node0 = Node(system0) system1 = ActorSystem("system1", ConfigFactory .parseString(""" akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5551 - } + remote.netty.port=5551 cluster.node-to-join = "akka://system0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = new Node(system1) + node1 = Node(system1) val latch = new CountDownLatch(2) @@ -90,17 +81,13 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" system2 = ActorSystem("system2", ConfigFactory .parseString(""" akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5552 - } + remote.netty.port=5552 cluster.node-to-join = "akka://system0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = new Node(system2) + node2 = Node(system2) val latch = new CountDownLatch(3) node0.registerListener(new MembershipChangeListener { diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index ddd773dce8..42ead86dfd 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -15,6 +15,8 @@ import com.typesafe.config._ class NodeMembershipSpec extends AkkaSpec(""" akka { + actor.provider = akka.remote.RemoteActorRefProvider + remote.netty.hostname = localhost loglevel = "INFO" } """) with ImplicitSender { @@ -33,34 +35,23 @@ class NodeMembershipSpec extends AkkaSpec(""" // ======= NODE 0 ======== system0 = ActorSystem("system0", ConfigFactory - .parseString(""" - akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5550 - } - }""") + .parseString("akka.remote.netty.port=5550") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] - node0 = system0.node + node0 = Node(system0) // ======= NODE 1 ======== system1 = ActorSystem("system1", ConfigFactory .parseString(""" akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5551 - } + remote.netty.port=5551 cluster.node-to-join = "akka://system0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = system1.node + node1 = Node(system1) Thread.sleep(10.seconds.dilated.toMillis) @@ -89,17 +80,13 @@ class NodeMembershipSpec extends AkkaSpec(""" system2 = ActorSystem("system2", ConfigFactory .parseString(""" akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5552 - } + remote.netty.port=5552 cluster.node-to-join = "akka://system0@localhost:5550" }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] - node2 = system2.node + node2 = Node(system2) Thread.sleep(10.seconds.dilated.toMillis) diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala index ed4b893619..640a541971 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeStartupSpec.scala @@ -16,6 +16,8 @@ import com.typesafe.config._ class NodeStartupSpec extends AkkaSpec(""" akka { loglevel = "INFO" + actor.provider = akka.remote.RemoteActorRefProvider + remote.netty.hostname = localhost } """) with ImplicitSender { @@ -26,19 +28,12 @@ class NodeStartupSpec extends AkkaSpec(""" try { "A first cluster node with a 'node-to-join' config set to empty string (singleton cluster)" must { - system0 = ActorSystem("NodeStartupSpec", ConfigFactory - .parseString(""" - akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5550 - } - }""") + system0 = ActorSystem("system0", ConfigFactory + .parseString("akka.remote.netty.port=5550") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote0 = system0.provider.asInstanceOf[RemoteActorRefProvider] - node0 = new Node(system0) + node0 = Node(system0) "be a singleton cluster when started up" in { Thread.sleep(1.seconds.dilated.toMillis) @@ -55,20 +50,16 @@ class NodeStartupSpec extends AkkaSpec(""" "A second cluster node with a 'node-to-join' config defined" must { "join the other node cluster as 'Joining' when sending a Join command" in { - system1 = ActorSystem("NodeStartupSpec", ConfigFactory + system1 = ActorSystem("system1", ConfigFactory .parseString(""" - akka { - actor.provider = "akka.remote.RemoteActorRefProvider" - remote.netty { - hostname = localhost - port=5551 - } - cluster.node-to-join = "akka://NodeStartupSpec@localhost:5550" - }""") + akka { + remote.netty.port=5551 + cluster.node-to-join = "akka://system0@localhost:5550" + }""") .withFallback(system.settings.config)) .asInstanceOf[ActorSystemImpl] val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] - node1 = new Node(system1) + node1 = Node(system1) Thread.sleep(1.seconds.dilated.toMillis) // give enough time for node1 to JOIN node0 val members = node0.latestGossip.members diff --git a/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala b/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala index 07a910388b..f71e9feb84 100644 --- a/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala +++ b/akka-remote/src/main/scala/akka/remote/RemoteTransport.scala @@ -133,19 +133,6 @@ case class RemoteServerClientClosed( ": Client[" + clientAddress.getOrElse("no address") + "]" } -case class RemoteServerWriteFailed( - @BeanProperty request: AnyRef, - @BeanProperty cause: Throwable, - @BeanProperty remote: RemoteTransport, - @BeanProperty remoteAddress: Option[Address]) extends RemoteServerLifeCycleEvent { - override def logLevel = Logging.WarningLevel - override def toString = - "RemoteServerWriteFailed@" + remote + - ": ClientAddress[" + remoteAddress + - "] MessageClass[" + (if (request ne null) request.getClass.getName else "no message") + - "] Error[" + cause + "]" -} - /** * Thrown for example when trying to send a message using a RemoteClient that is either not started or shut down. */ diff --git a/project/AkkaBuild.scala b/project/AkkaBuild.scala index a883eb7093..484d6d473e 100644 --- a/project/AkkaBuild.scala +++ b/project/AkkaBuild.scala @@ -327,9 +327,7 @@ object AkkaBuild extends Build { // Settings - override lazy val settings = super.settings ++ buildSettings ++ Seq( - resolvers += "Sonatype Snapshot Repo" at "https://oss.sonatype.org/content/repositories/snapshots/" - ) + override lazy val settings = super.settings ++ buildSettings lazy val baseSettings = Defaults.defaultSettings ++ Publish.settings From 9e5f42c17d00ab53a384b992f0170e4c2f1d80d0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Fri, 2 Mar 2012 16:20:30 +0100 Subject: [PATCH 66/72] Added '/system/cluster' top-level supervisor for all cluster daemons. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/Node.scala | 50 ++++++++++++------- 1 file changed, 31 insertions(+), 19 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index dfa5e6efd7..f173bc0a4e 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -12,6 +12,7 @@ import akka.event.Logging import akka.dispatch.Await import akka.pattern.ask import akka.util._ +import akka.util.duration._ import akka.config.ConfigurationException import java.util.concurrent.atomic.{ AtomicReference, AtomicBoolean } @@ -259,7 +260,21 @@ final class ClusterGossipDaemon(system: ActorSystem, node: Node) extends Actor { case GossipEnvelope(sender, gossip) ⇒ node.receive(sender, gossip) } - override def unhandled(unknown: Any) = log.error("Unknown message sent to cluster daemon [" + unknown + "]") + override def unhandled(unknown: Any) = log.error("Unknown message sent to cluster daemon [{}]", unknown) +} + +final class ClusterDaemonSupervisor(system: ActorSystem, node: Node) extends Actor { + val log = Logging(system, "ClusterDaemonSupervisor") + + override val supervisorStrategy = SupervisorStrategy.defaultStrategy + + private val commands = context.actorOf(Props(new ClusterCommandDaemon(system, node)), "commands") + private val gossip = context.actorOf(Props(new ClusterGossipDaemon(system, node)) + .withRouter(RoundRobinRouter(node.clusterSettings.NrOfGossipDaemons)), "gossip") + + def receive = { + case unknown ⇒ log.error("/system/cluster can not respond to messages - received [{}]", unknown) + } } /** @@ -317,8 +332,8 @@ class Node(system: ExtendedActorSystem) extends Extension { private val remote: RemoteActorRefProvider = system.provider.asInstanceOf[RemoteActorRefProvider] - private val remoteSettings = new RemoteSettings(system.settings.config, system.name) - private val clusterSettings = new ClusterSettings(system.settings.config, system.name) + val remoteSettings = new RemoteSettings(system.settings.config, system.name) + val clusterSettings = new ClusterSettings(system.settings.config, system.name) private val remoteAddress = remote.transport.address private val vclockNode = VectorClock.Node(remoteAddress.toString) @@ -343,11 +358,16 @@ class Node(system: ExtendedActorSystem) extends Extension { private val log = Logging(system, "Node") private val random = SecureRandom.getInstance("SHA1PRNG") - private val clusterCommandDaemon = systemActorOf( - Props(new ClusterCommandDaemon(system, this)), "clusterCommand") + // create superisor for daemons under path "/system/cluster" + private val clusterDaemons = { + val createChild = CreateChild(Props(new ClusterDaemonSupervisor(system, this)), "cluster") + Await.result(system.systemGuardian ? createChild, defaultTimeout.duration) match { + case a: ActorRef ⇒ a + case e: Exception ⇒ throw e + } + } - private val clusterGossipDaemon = systemActorOf( - Props(new ClusterGossipDaemon(system, this)).withRouter(RoundRobinRouter(nrOfGossipDaemons)), "clusterGossip") + private val clusterCommandDaemon = system.actorFor("/system/cluster/commands") private val state = { val member = Member(remoteAddress, MemberStatus.Joining) @@ -412,11 +432,10 @@ class Node(system: ExtendedActorSystem) extends Extension { def shutdown() { // FIXME Cheating for now. Can't just shut down. Node must first gossip an Leave command, wait for Leader to do proper Handoff and then await an Exit command before switching to Removed if (isRunning.compareAndSet(true, false)) { - log.info("Node [{}] - Shutting down Node and ClusterDaemon...", remoteAddress) + log.info("Node [{}] - Shutting down Node and cluster daemons...", remoteAddress) gossipCanceller.cancel() scrutinizeCanceller.cancel() - system.stop(clusterCommandDaemon) - system.stop(clusterGossipDaemon) + system.stop(clusterDaemons) } } @@ -747,22 +766,15 @@ class Node(system: ExtendedActorSystem) extends Extension { // } else None } - private def systemActorOf(props: Props, name: String): ActorRef = { - Await.result(system.systemGuardian ? CreateChild(props, name), system.settings.CreationTimeout.duration) match { - case ref: ActorRef ⇒ ref - case ex: Exception ⇒ throw ex - } - } - /** * Sets up cluster command connection. */ - private def clusterCommandConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "clusterCommand") + private def clusterCommandConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "cluster" / "commands") /** * Sets up cluster gossip connection. */ - private def clusterGossipConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "clusterGossip") + private def clusterGossipConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "cluster" / "gossip") private def deputyNodes: Iterable[Address] = state.get.latestGossip.members.toIterable map (_.address) drop 1 take nrOfDeputyNodes filter (_ != remoteAddress) From 0ad00bd699fca27c80567cc6c2af8dcc096d981e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Sat, 3 Mar 2012 23:55:48 +0100 Subject: [PATCH 67/72] Cleaned up cluster daemons instantiation. Added address field to all cluster commands. Added more state transitions in Joining phase + tests to cover it. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/Node.scala | 137 ++++++++++-------- .../cluster/ClusterCommandDaemonFSMSpec.scala | 126 ++++++++-------- 2 files changed, 134 insertions(+), 129 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index f173bc0a4e..7642fc39b6 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -60,27 +60,27 @@ object ClusterAction { /** * Command to set a node to Up (from Joining). */ - case object Up extends ClusterMessage + case class Up(address: Address) extends ClusterMessage /** * Command to leave the cluster. */ - case object Leave extends ClusterMessage + case class Leave(address: Address) extends ClusterMessage /** * Command to mark node as temporary down. */ - case object Down extends ClusterMessage + case class Down(address: Address) extends ClusterMessage /** * Command to mark a node to be removed from the cluster immediately. */ - case object Exit extends ClusterMessage + case class Exit(address: Address) extends ClusterMessage /** * Command to remove a node from the cluster immediately. */ - case object Remove extends ClusterMessage + case class Remove(address: Address) extends ClusterMessage } /** @@ -162,75 +162,85 @@ case class Gossip( /** * FSM actor managing the different cluster nodes states. - * Single instance - e.g. serialized access to Node - message after message. + * Instantiated as a single instance for each Node - e.g. commands are serialized to Node message after message. */ -final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor with FSM[MemberStatus, Unit] { +final class ClusterCommandDaemon extends Actor with FSM[MemberStatus, Unit] { + val node = Node(context.system) - // start in JOINING + // ======================== + // === START IN JOINING == startWith(MemberStatus.Joining, Unit) // ======================== - // === IN JOINING === + // === WHEN JOINING === when(MemberStatus.Joining) { - case Event(ClusterAction.Up, _) ⇒ - node.up() + case Event(ClusterAction.Up(address), _) ⇒ + node.up(address) goto(MemberStatus.Up) + + case Event(ClusterAction.Remove(address), _) ⇒ + node.removing(address) + goto(MemberStatus.Removed) + + case Event(ClusterAction.Down(address), _) ⇒ + node.downing(address) + goto(MemberStatus.Down) } // ======================== - // === IN UP === + // === WHEN UP === when(MemberStatus.Up) { - case Event(ClusterAction.Down, _) ⇒ - node.downing() + case Event(ClusterAction.Down(address), _) ⇒ + node.downing(address) goto(MemberStatus.Down) - case Event(ClusterAction.Leave, _) ⇒ - node.leaving() + case Event(ClusterAction.Leave(address), _) ⇒ + node.leaving(address) goto(MemberStatus.Leaving) - case Event(ClusterAction.Exit, _) ⇒ - node.exiting() + case Event(ClusterAction.Exit(address), _) ⇒ + node.exiting(address) goto(MemberStatus.Exiting) - case Event(ClusterAction.Remove, _) ⇒ - node.removing() + case Event(ClusterAction.Remove(address), _) ⇒ + node.removing(address) goto(MemberStatus.Removed) } // ======================== - // === IN LEAVING === + // === WHEN LEAVING === when(MemberStatus.Leaving) { - case Event(ClusterAction.Down, _) ⇒ - node.downing() + case Event(ClusterAction.Down(address), _) ⇒ + node.downing(address) goto(MemberStatus.Down) - case Event(ClusterAction.Remove, _) ⇒ - node.removing() + case Event(ClusterAction.Remove(address), _) ⇒ + node.removing(address) goto(MemberStatus.Removed) } // ======================== - // === IN EXITING === + // === WHEN EXITING === when(MemberStatus.Exiting) { - case Event(ClusterAction.Remove, _) ⇒ - node.removing() + case Event(ClusterAction.Remove(address), _) ⇒ + node.removing(address) goto(MemberStatus.Removed) } // ======================== - // === IN DOWN === + // === WHEN DOWN === when(MemberStatus.Down) { // FIXME How to transition from DOWN => JOINING when node comes back online. Can't just listen to Gossip message since it is received be another actor. How to fix this? - case Event(ClusterAction.Remove, _) ⇒ - node.removing() + case Event(ClusterAction.Remove(address), _) ⇒ + node.removing(address) goto(MemberStatus.Removed) } // ======================== - // === IN REMOVED === + // === WHEN REMOVED === when(MemberStatus.Removed) { case command ⇒ - log.warning("Removed node [{}] received cluster command [{}]", system.name, command) + log.warning("Removed node [{}] received cluster command [{}]", context.system.name, command) stay } @@ -242,39 +252,39 @@ final class ClusterCommandDaemon(system: ActorSystem, node: Node) extends Actor node.joining(address) stay - case Event(command, _) ⇒ { - log.warning("Unhandled command [{}] in state [{}]", command, stateName) + case Event(illegal, _) ⇒ { + log.error("Illegal command [{}] in state [{}]", illegal, stateName) stay } } } /** - * Pooled and routed wit N number of configurable instances. + * Pooled and routed with N number of configurable instances. * Concurrent access to Node. */ -final class ClusterGossipDaemon(system: ActorSystem, node: Node) extends Actor { - val log = Logging(system, "ClusterGossipDaemon") +final class ClusterGossipDaemon extends Actor { + val log = Logging(context.system, this) + val node = Node(context.system) def receive = { case GossipEnvelope(sender, gossip) ⇒ node.receive(sender, gossip) } - override def unhandled(unknown: Any) = log.error("Unknown message sent to cluster daemon [{}]", unknown) + override def unhandled(unknown: Any) = log.error("[/system/cluster/gossip] can not respond to messages - received [{}]", unknown) } -final class ClusterDaemonSupervisor(system: ActorSystem, node: Node) extends Actor { - val log = Logging(system, "ClusterDaemonSupervisor") +final class ClusterDaemonSupervisor extends Actor { + val log = Logging(context.system, this) + val node = Node(context.system) - override val supervisorStrategy = SupervisorStrategy.defaultStrategy + private val commands = context.actorOf(Props[ClusterCommandDaemon], "commands") + private val gossip = context.actorOf( + Props[ClusterGossipDaemon].withRouter(RoundRobinRouter(node.clusterSettings.NrOfGossipDaemons)), "gossip") - private val commands = context.actorOf(Props(new ClusterCommandDaemon(system, node)), "commands") - private val gossip = context.actorOf(Props(new ClusterGossipDaemon(system, node)) - .withRouter(RoundRobinRouter(node.clusterSettings.NrOfGossipDaemons)), "gossip") + def receive = Actor.emptyBehavior - def receive = { - case unknown ⇒ log.error("/system/cluster can not respond to messages - received [{}]", unknown) - } + override def unhandled(unknown: Any): Unit = log.error("/system/cluster can not respond to messages - received [{}]", unknown) } /** @@ -360,15 +370,13 @@ class Node(system: ExtendedActorSystem) extends Extension { // create superisor for daemons under path "/system/cluster" private val clusterDaemons = { - val createChild = CreateChild(Props(new ClusterDaemonSupervisor(system, this)), "cluster") + val createChild = CreateChild(Props[ClusterDaemonSupervisor], "cluster") Await.result(system.systemGuardian ? createChild, defaultTimeout.duration) match { case a: ActorRef ⇒ a case e: Exception ⇒ throw e } } - private val clusterCommandDaemon = system.actorFor("/system/cluster/commands") - private val state = { val member = Member(remoteAddress, MemberStatus.Joining) val gossip = Gossip(members = SortedSet.empty[Member] + member) + vclockNode // add me as member and update my vector clock @@ -471,22 +479,22 @@ class Node(system: ExtendedActorSystem) extends Extension { /** * Send command to issue state transition to LEAVING. */ - def sendLeave() { - clusterCommandDaemon ! ClusterAction.Leave + def sendLeave(address: Address) { + clusterCommandDaemon ! ClusterAction.Leave(address) } /** * Send command to issue state transition to EXITING. */ - def sendDown() { - clusterCommandDaemon ! ClusterAction.Down + def sendDown(address: Address) { + clusterCommandDaemon ! ClusterAction.Down(address) } /** * Send command to issue state transition to REMOVED. */ - def sendRemove() { - clusterCommandDaemon ! ClusterAction.Remove + def sendRemove(address: Address) { + clusterCommandDaemon ! ClusterAction.Remove(address) } // ======================================================== @@ -525,27 +533,27 @@ class Node(system: ExtendedActorSystem) extends Extension { /** * State transition to UP. */ - private[cluster] final def up() {} + private[cluster] final def up(address: Address) {} /** * State transition to LEAVING. */ - private[cluster] final def leaving() {} + private[cluster] final def leaving(address: Address) {} /** * State transition to EXITING. */ - private[cluster] final def exiting() {} + private[cluster] final def exiting(address: Address) {} /** * State transition to REMOVED. */ - private[cluster] final def removing() {} + private[cluster] final def removing(address: Address) {} /** * State transition to DOWN. */ - private[cluster] final def downing() {} + private[cluster] final def downing(address: Address) {} /** * Receive new gossip. @@ -771,6 +779,11 @@ class Node(system: ExtendedActorSystem) extends Extension { */ private def clusterCommandConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "cluster" / "commands") + /** + * Sets up local cluster command connection. + */ + private def clusterCommandDaemon = system.actorFor(RootActorPath(remoteAddress) / "system" / "cluster" / "commands") + /** * Sets up cluster gossip connection. */ diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala index b0b5c6dadf..2aeeb1835b 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala @@ -7,147 +7,139 @@ package akka.cluster import akka.testkit._ import akka.actor.Address -class ClusterCommandDaemonFSMSpec extends AkkaSpec( - """ - akka { - actor { - provider = akka.remote.RemoteActorRefProvider - } - } - """) with ImplicitSender { +class ClusterCommandDaemonFSMSpec + extends AkkaSpec("akka.actor.provider = akka.remote.RemoteActorRefProvider") + with ImplicitSender { "A ClusterCommandDaemon FSM" must { - + val address = Address("akka", system.name) "start in Joining" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) + val fsm = TestFSMRef(new ClusterCommandDaemon) fsm.stateName must be(MemberStatus.Joining) } - "be able to switch from Joining to Up" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) + val fsm = TestFSMRef(new ClusterCommandDaemon) fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up + fsm ! ClusterAction.Up(address) fsm.stateName must be(MemberStatus.Up) } - + "be able to switch from Joining to Down" in { + val fsm = TestFSMRef(new ClusterCommandDaemon) + fsm.stateName must be(MemberStatus.Joining) + fsm ! ClusterAction.Down(address) + fsm.stateName must be(MemberStatus.Down) + } + "be able to switch from Joining to Removed" in { + val fsm = TestFSMRef(new ClusterCommandDaemon) + fsm.stateName must be(MemberStatus.Joining) + fsm ! ClusterAction.Remove(address) + fsm.stateName must be(MemberStatus.Removed) + } "be able to switch from Up to Down" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) + val fsm = TestFSMRef(new ClusterCommandDaemon) fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up + fsm ! ClusterAction.Up(address) fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Down + fsm ! ClusterAction.Down(address) fsm.stateName must be(MemberStatus.Down) } - "be able to switch from Up to Leaving" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) + val fsm = TestFSMRef(new ClusterCommandDaemon) fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up + fsm ! ClusterAction.Up(address) fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Leave + fsm ! ClusterAction.Leave(address) fsm.stateName must be(MemberStatus.Leaving) } - "be able to switch from Up to Exiting" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) + val fsm = TestFSMRef(new ClusterCommandDaemon) fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up + fsm ! ClusterAction.Up(address) fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Exit + fsm ! ClusterAction.Exit(address) fsm.stateName must be(MemberStatus.Exiting) } - "be able to switch from Up to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) + val fsm = TestFSMRef(new ClusterCommandDaemon) fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up + fsm ! ClusterAction.Up(address) fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Remove + fsm ! ClusterAction.Remove(address) fsm.stateName must be(MemberStatus.Removed) } - "be able to switch from Leaving to Down" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) + val fsm = TestFSMRef(new ClusterCommandDaemon) fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up + fsm ! ClusterAction.Up(address) fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Leave + fsm ! ClusterAction.Leave(address) fsm.stateName must be(MemberStatus.Leaving) - fsm ! ClusterAction.Down + fsm ! ClusterAction.Down(address) fsm.stateName must be(MemberStatus.Down) } - "be able to switch from Leaving to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) + val fsm = TestFSMRef(new ClusterCommandDaemon) fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up + fsm ! ClusterAction.Up(address) fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Leave + fsm ! ClusterAction.Leave(address) fsm.stateName must be(MemberStatus.Leaving) - fsm ! ClusterAction.Remove + fsm ! ClusterAction.Remove(address) fsm.stateName must be(MemberStatus.Removed) } - "be able to switch from Exiting to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) + val fsm = TestFSMRef(new ClusterCommandDaemon) fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up + fsm ! ClusterAction.Up(address) fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Exit + fsm ! ClusterAction.Exit(address) fsm.stateName must be(MemberStatus.Exiting) - fsm ! ClusterAction.Remove + fsm ! ClusterAction.Remove(address) fsm.stateName must be(MemberStatus.Removed) } - "be able to switch from Down to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) + val fsm = TestFSMRef(new ClusterCommandDaemon) fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up + fsm ! ClusterAction.Up(address) fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Down + fsm ! ClusterAction.Down(address) fsm.stateName must be(MemberStatus.Down) - fsm ! ClusterAction.Remove + fsm ! ClusterAction.Remove(address) fsm.stateName must be(MemberStatus.Removed) } - "not be able to switch from Removed to any other state" in { - val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) + val fsm = TestFSMRef(new ClusterCommandDaemon) fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up + fsm ! ClusterAction.Up(address) fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Remove + fsm ! ClusterAction.Remove(address) fsm.stateName must be(MemberStatus.Removed) - fsm ! ClusterAction.Up + fsm ! ClusterAction.Up(address) fsm.stateName must be(MemberStatus.Removed) - fsm ! ClusterAction.Leave + fsm ! ClusterAction.Leave(address) fsm.stateName must be(MemberStatus.Removed) - fsm ! ClusterAction.Down + fsm ! ClusterAction.Down(address) fsm.stateName must be(MemberStatus.Removed) - fsm ! ClusterAction.Exit + fsm ! ClusterAction.Exit(address) fsm.stateName must be(MemberStatus.Removed) - fsm ! ClusterAction.Remove + fsm ! ClusterAction.Remove(address) fsm.stateName must be(MemberStatus.Removed) } "remain in the same state when receiving a Join command" in { - val address = Address("akka", system.name) - - val fsm = TestFSMRef(new ClusterCommandDaemon(system, Node(system))) + val fsm = TestFSMRef(new ClusterCommandDaemon) fsm.stateName must be(MemberStatus.Joining) fsm ! ClusterAction.Join(address) fsm.stateName must be(MemberStatus.Joining) - - fsm ! ClusterAction.Up + fsm ! ClusterAction.Up(address) fsm.stateName must be(MemberStatus.Up) fsm ! ClusterAction.Join(address) fsm.stateName must be(MemberStatus.Up) - - fsm ! ClusterAction.Leave + fsm ! ClusterAction.Leave(address) fsm.stateName must be(MemberStatus.Leaving) fsm ! ClusterAction.Join(address) fsm.stateName must be(MemberStatus.Leaving) - - fsm ! ClusterAction.Down + fsm ! ClusterAction.Down(address) fsm.stateName must be(MemberStatus.Down) fsm ! ClusterAction.Join(address) fsm.stateName must be(MemberStatus.Down) From 76f29a80d8a261b030f0b5892a7746ab71c8fa05 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Mon, 5 Mar 2012 11:23:42 +0100 Subject: [PATCH 68/72] Fixed ugly log printout in ActorRefProvider. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala b/akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala index 5fb8936d4d..d9f710b533 100644 --- a/akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala +++ b/akka-actor/src/main/scala/akka/actor/ActorRefProvider.scala @@ -544,7 +544,7 @@ class LocalActorRefProvider( deadLetters } else ref.getChild(path.iterator) match { case Nobody ⇒ - log.warning("look-up of path sequence [{}] failed", path) + log.warning("look-up of path sequence [/{}] failed", path.mkString("/")) new EmptyLocalActorRef(system.provider, ref.path / path, eventStream) case x ⇒ x } From 81b68e2fc002d2c87f1a47beaae9ad4f8d9c9c40 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Fri, 9 Mar 2012 12:56:56 +0100 Subject: [PATCH 69/72] Added DOWNING (user downing and auto-downing) and LEADER actions. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Added possibility for user to 'down' a node * Added possibility for the leader to 'auto-down' a node. * Added leader role actions - Moving nodes from JOINING -> UP - Moving nodes from EXITING -> REMOVED - AUTO-DOWNING * Added tests for user and leader downing * Added 'auto-down' option to turn auto-downing on and off * Fixed bug in semantic Member Ordering * Removed FSM stuff from ClusterCommandDaemon (including the test) since the node status should only be in the converged gossip state Signed-off-by: Jonas Bonér --- .../src/main/resources/reference.conf | 3 + .../scala/akka/cluster/ClusterSettings.scala | 1 + .../src/main/scala/akka/cluster/Node.scala | 632 ++++++++++++------ .../akka/cluster/ClientDowningSpec.scala | 186 ++++++ .../cluster/ClusterCommandDaemonFSMSpec.scala | 148 ---- .../akka/cluster/ClusterConfigSpec.scala | 1 + .../GossipingAccrualFailureDetectorSpec.scala | 6 +- .../akka/cluster/LeaderDowningSpec.scala | 179 +++++ .../akka/cluster/LeaderElectionSpec.scala | 14 +- .../MembershipChangeListenerSpec.scala | 4 +- .../akka/cluster/NodeMembershipSpec.scala | 28 +- 11 files changed, 823 insertions(+), 379 deletions(-) create mode 100644 akka-cluster/src/test/scala/akka/cluster/ClientDowningSpec.scala delete mode 100644 akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala create mode 100644 akka-cluster/src/test/scala/akka/cluster/LeaderDowningSpec.scala diff --git a/akka-cluster/src/main/resources/reference.conf b/akka-cluster/src/main/resources/reference.conf index 0917909504..42ce7e4a77 100644 --- a/akka-cluster/src/main/resources/reference.conf +++ b/akka-cluster/src/main/resources/reference.conf @@ -12,6 +12,9 @@ akka { # leave as empty string if the node should be a singleton cluster node-to-join = "" + # should the 'leader' in the cluster be allowed to automatically mark unreachable nodes as DOWN? + auto-down = on + # the number of gossip daemon actors nr-of-gossip-daemons = 4 nr-of-deputy-nodes = 3 diff --git a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala index e05c04b9d7..50b0f5bd0b 100644 --- a/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala +++ b/akka-cluster/src/main/scala/akka/cluster/ClusterSettings.scala @@ -23,4 +23,5 @@ class ClusterSettings(val config: Config, val systemName: String) { val GossipFrequency = Duration(getMilliseconds("akka.cluster.gossip.frequency"), MILLISECONDS) val NrOfGossipDaemons = getInt("akka.cluster.nr-of-gossip-daemons") val NrOfDeputyNodes = getInt("akka.cluster.nr-of-deputy-nodes") + val AutoDown = getBoolean("akka.cluster.auto-down") } diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index 7642fc39b6..03c9c90515 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -29,6 +29,7 @@ import com.google.protobuf.ByteString * Interface for membership change listener. */ trait MembershipChangeListener { + // FIXME bad for Java - convert to Array? def notify(members: SortedSet[Member]): Unit } @@ -36,6 +37,7 @@ trait MembershipChangeListener { * Interface for meta data change listener. */ trait MetaDataChangeListener { // FIXME add management and notification for MetaDataChangeListener + // FIXME bad for Java - convert to what? def notify(meta: Map[String, Array[Byte]]): Unit } @@ -86,7 +88,46 @@ object ClusterAction { /** * Represents the address and the current status of a cluster member node. */ -case class Member(address: Address, status: MemberStatus) extends ClusterMessage +class Member(val address: Address, val status: MemberStatus) extends ClusterMessage { + override def hashCode = address.## + override def equals(other: Any) = Member.unapply(this) == Member.unapply(other) + override def toString = "Member(address = %s, status = %s)" format (address, status) + def copy(address: Address = this.address, status: MemberStatus = this.status): Member = new Member(address, status) +} + +/** + * Factory and Utility module for Member instances. + */ +object Member { + import MemberStatus._ + + implicit val ordering = Ordering.fromLessThan[Member](_.address.toString < _.address.toString) + + def apply(address: Address, status: MemberStatus): Member = new Member(address, status) + + def unapply(other: Any) = other match { + case m: Member ⇒ Some(m.address) + case _ ⇒ None + } + + /** + * Picks the Member with the highest "priority" MemberStatus. + */ + def highestPriorityOf(m1: Member, m2: Member): Member = (m1.status, m2.status) match { + case (Removed, _) ⇒ m1 + case (_, Removed) ⇒ m2 + case (Down, _) ⇒ m1 + case (_, Down) ⇒ m2 + case (Exiting, _) ⇒ m1 + case (_, Exiting) ⇒ m2 + case (Leaving, _) ⇒ m1 + case (_, Leaving) ⇒ m2 + case (Up, Joining) ⇒ m1 + case (Joining, Up) ⇒ m2 + case (Joining, Joining) ⇒ m1 + case (Up, Up) ⇒ m1 + } +} /** * Envelope adding a sender address to the gossip. @@ -106,6 +147,14 @@ object MemberStatus { case object Exiting extends MemberStatus case object Down extends MemberStatus case object Removed extends MemberStatus + + def isUnavailable(status: MemberStatus): Boolean = { + // status == MemberStatus.Joining || + status == MemberStatus.Down || + status == MemberStatus.Exiting || + status == MemberStatus.Removed || + status == MemberStatus.Leaving + } } /** @@ -113,7 +162,9 @@ object MemberStatus { */ case class GossipOverview( seen: Map[Address, VectorClock] = Map.empty[Address, VectorClock], - unreachable: Set[Address] = Set.empty[Address]) { + unreachable: Set[Member] = Set.empty[Member]) { + + // FIXME document when nodes are put in 'unreachable' set and removed from 'members' override def toString = "GossipOverview(seen = [" + seen.mkString(", ") + @@ -151,6 +202,40 @@ case class Gossip( else this copy (overview = overview copy (seen = overview.seen + (address -> version))) } + /** + * Merges two Gossip instances including membership tables, meta-data tables and the VectorClock histories. + */ + def merge(that: Gossip): Gossip = { + import Member.ordering + + // 1. merge vector clocks + val mergedVClock = this.version merge that.version + + // 2. group all members by Address => Vector[Member] + var membersGroupedByAddress = Map.empty[Address, Vector[Member]] + (this.members ++ that.members) foreach { m ⇒ + val ms = membersGroupedByAddress.get(m.address).getOrElse(Vector.empty[Member]) + membersGroupedByAddress += (m.address -> (ms :+ m)) + } + + // 3. merge members by selecting the single Member with highest MemberStatus out of the Member groups + val mergedMembers = + SortedSet.empty[Member] ++ + membersGroupedByAddress.values.foldLeft(Vector.empty[Member]) { (acc, members) ⇒ + acc :+ members.reduceLeft(Member.highestPriorityOf(_, _)) + } + + // 4. merge meta-data + val mergedMeta = this.meta ++ that.meta + + // 5. merge gossip overview + val mergedOverview = GossipOverview( + this.overview.seen ++ that.overview.seen, + this.overview.unreachable ++ that.overview.unreachable) + + Gossip(mergedOverview, mergedMembers, mergedMeta, mergedVClock) + } + override def toString = "Gossip(" + "overview = " + overview + @@ -161,102 +246,25 @@ case class Gossip( } /** - * FSM actor managing the different cluster nodes states. + * Manages routing of the different cluster commands. * Instantiated as a single instance for each Node - e.g. commands are serialized to Node message after message. */ -final class ClusterCommandDaemon extends Actor with FSM[MemberStatus, Unit] { +final class ClusterCommandDaemon extends Actor { + import ClusterAction._ + val node = Node(context.system) + val log = Logging(context.system, this) - // ======================== - // === START IN JOINING == - startWith(MemberStatus.Joining, Unit) - - // ======================== - // === WHEN JOINING === - when(MemberStatus.Joining) { - case Event(ClusterAction.Up(address), _) ⇒ - node.up(address) - goto(MemberStatus.Up) - - case Event(ClusterAction.Remove(address), _) ⇒ - node.removing(address) - goto(MemberStatus.Removed) - - case Event(ClusterAction.Down(address), _) ⇒ - node.downing(address) - goto(MemberStatus.Down) + def receive = { + case Join(address) ⇒ node.joining(address) + case Up(address) ⇒ node.up(address) + case Down(address) ⇒ node.downing(address) + case Leave(address) ⇒ node.leaving(address) + case Exit(address) ⇒ node.exiting(address) + case Remove(address) ⇒ node.removing(address) } - // ======================== - // === WHEN UP === - when(MemberStatus.Up) { - case Event(ClusterAction.Down(address), _) ⇒ - node.downing(address) - goto(MemberStatus.Down) - - case Event(ClusterAction.Leave(address), _) ⇒ - node.leaving(address) - goto(MemberStatus.Leaving) - - case Event(ClusterAction.Exit(address), _) ⇒ - node.exiting(address) - goto(MemberStatus.Exiting) - - case Event(ClusterAction.Remove(address), _) ⇒ - node.removing(address) - goto(MemberStatus.Removed) - } - - // ======================== - // === WHEN LEAVING === - when(MemberStatus.Leaving) { - case Event(ClusterAction.Down(address), _) ⇒ - node.downing(address) - goto(MemberStatus.Down) - - case Event(ClusterAction.Remove(address), _) ⇒ - node.removing(address) - goto(MemberStatus.Removed) - } - - // ======================== - // === WHEN EXITING === - when(MemberStatus.Exiting) { - case Event(ClusterAction.Remove(address), _) ⇒ - node.removing(address) - goto(MemberStatus.Removed) - } - - // ======================== - // === WHEN DOWN === - when(MemberStatus.Down) { - // FIXME How to transition from DOWN => JOINING when node comes back online. Can't just listen to Gossip message since it is received be another actor. How to fix this? - case Event(ClusterAction.Remove(address), _) ⇒ - node.removing(address) - goto(MemberStatus.Removed) - } - - // ======================== - // === WHEN REMOVED === - when(MemberStatus.Removed) { - case command ⇒ - log.warning("Removed node [{}] received cluster command [{}]", context.system.name, command) - stay - } - - // ======================== - // === GENERIC AND UNHANDLED COMMANDS === - whenUnhandled { - // should be able to handle Join in any state - case Event(ClusterAction.Join(address), _) ⇒ - node.joining(address) - stay - - case Event(illegal, _) ⇒ { - log.error("Illegal command [{}] in state [{}]", illegal, stateName) - stay - } - } + override def unhandled(unknown: Any) = log.error("Illegal command [{}]", unknown) } /** @@ -274,6 +282,9 @@ final class ClusterGossipDaemon extends Actor { override def unhandled(unknown: Any) = log.error("[/system/cluster/gossip] can not respond to messages - received [{}]", unknown) } +/** + * Supervisor managing the different cluste daemons. + */ final class ClusterDaemonSupervisor extends Actor { val log = Logging(context.system, this) val node = Node(context.system) @@ -333,7 +344,6 @@ class Node(system: ExtendedActorSystem) extends Extension { * all state is represented by this immutable case class and managed by an AtomicReference. */ private case class State( - self: Member, latestGossip: Gossip, memberMembershipChangeListeners: Set[MembershipChangeListener] = Set.empty[MembershipChangeListener]) @@ -345,19 +355,18 @@ class Node(system: ExtendedActorSystem) extends Extension { val remoteSettings = new RemoteSettings(system.settings.config, system.name) val clusterSettings = new ClusterSettings(system.settings.config, system.name) - private val remoteAddress = remote.transport.address + val remoteAddress = remote.transport.address + val failureDetector = new AccrualFailureDetector( + system, remoteAddress, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) + private val vclockNode = VectorClock.Node(remoteAddress.toString) private val gossipInitialDelay = clusterSettings.GossipInitialDelay private val gossipFrequency = clusterSettings.GossipFrequency - implicit private val memberOrdering = Ordering.fromLessThan[Member](_.address.toString < _.address.toString) - implicit private val defaultTimeout = Timeout(remoteSettings.RemoteSystemDaemonAckTimeout) - val failureDetector = new AccrualFailureDetector( - system, remoteAddress, clusterSettings.FailureDetectorThreshold, clusterSettings.FailureDetectorMaxSampleSize) - + private val autoDown = clusterSettings.AutoDown private val nrOfDeputyNodes = clusterSettings.NrOfDeputyNodes private val nrOfGossipDaemons = clusterSettings.NrOfGossipDaemons private val nodeToJoin: Option[Address] = clusterSettings.NodeToJoin filter (_ != remoteAddress) @@ -368,6 +377,8 @@ class Node(system: ExtendedActorSystem) extends Extension { private val log = Logging(system, "Node") private val random = SecureRandom.getInstance("SHA1PRNG") + log.info("Node [{}] - Starting cluster Node...", remoteAddress) + // create superisor for daemons under path "/system/cluster" private val clusterDaemons = { val createChild = CreateChild(Props[ClusterDaemonSupervisor], "cluster") @@ -380,30 +391,41 @@ class Node(system: ExtendedActorSystem) extends Extension { private val state = { val member = Member(remoteAddress, MemberStatus.Joining) val gossip = Gossip(members = SortedSet.empty[Member] + member) + vclockNode // add me as member and update my vector clock - new AtomicReference[State](State(member, gossip)) + new AtomicReference[State](State(gossip)) } - import Versioned.latestVersionOf - - log.info("Node [{}] - Starting cluster Node...", remoteAddress) - // try to join the node defined in the 'akka.cluster.node-to-join' option autoJoin() + // ======================================================== + // ===================== WORK DAEMONS ===================== + // ======================================================== + // start periodic gossip to random nodes in cluster private val gossipCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { gossip() } - // start periodic cluster scrutinization (moving nodes condemned by the failure detector to unreachable list) - private val scrutinizeCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { - scrutinize() + // start periodic cluster failure detector reaping (moving nodes condemned by the failure detector to unreachable list) + private val failureDetectorReaperCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { // TODO: should we use the same gossipFrequency for reaping? + reapUnreachableMembers() } + // start periodic leader action management (only applies for the current leader) + private val leaderActionsCanceller = system.scheduler.schedule(gossipInitialDelay, gossipFrequency) { // TODO: should we use the same gossipFrequency for leaderActions? + leaderActions() + } + + log.info("Node [{}] - Cluster Node started successfully", remoteAddress) + // ====================================================== // ===================== PUBLIC API ===================== // ====================================================== + def self: Member = latestGossip.members + .find(_.address == remoteAddress) + .getOrElse(throw new IllegalStateException("Can't find 'this' Member in the cluster membership ring")) + /** * Latest gossip. */ @@ -412,14 +434,14 @@ class Node(system: ExtendedActorSystem) extends Extension { /** * Member status for this node. */ - def self: Member = state.get.self + def status: MemberStatus = self.status /** * Is this node the leader? */ def isLeader: Boolean = { - val currentState = state.get - remoteAddress == currentState.latestGossip.members.head.address + val members = latestGossip.members + !members.isEmpty && (remoteAddress == members.head.address) } /** @@ -434,6 +456,11 @@ class Node(system: ExtendedActorSystem) extends Extension { */ def convergence: Option[Gossip] = convergence(latestGossip) + /** + * Returns true if the node is UP or JOINING. + */ + def isAvailable: Boolean = !isUnavailable(state.get) + /** * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. */ @@ -442,7 +469,8 @@ class Node(system: ExtendedActorSystem) extends Extension { if (isRunning.compareAndSet(true, false)) { log.info("Node [{}] - Shutting down Node and cluster daemons...", remoteAddress) gossipCanceller.cancel() - scrutinizeCanceller.cancel() + failureDetectorReaperCanceller.cancel() + leaderActionsCanceller.cancel() system.stop(clusterDaemons) } } @@ -472,28 +500,28 @@ class Node(system: ExtendedActorSystem) extends Extension { /** * Send command to JOIN one node to another. */ - def sendJoin(address: Address) { + def scheduleNodeJoin(address: Address) { clusterCommandDaemon ! ClusterAction.Join(address) } /** * Send command to issue state transition to LEAVING. */ - def sendLeave(address: Address) { + def scheduleNodeLeave(address: Address) { clusterCommandDaemon ! ClusterAction.Leave(address) } /** * Send command to issue state transition to EXITING. */ - def sendDown(address: Address) { + def scheduleNodeDown(address: Address) { clusterCommandDaemon ! ClusterAction.Down(address) } /** * Send command to issue state transition to REMOVED. */ - def sendRemove(address: Address) { + def scheduleNodeRemove(address: Address) { clusterCommandDaemon ! ClusterAction.Remove(address) } @@ -512,9 +540,15 @@ class Node(system: ExtendedActorSystem) extends Extension { val localState = state.get val localGossip = localState.latestGossip val localMembers = localGossip.members + val localOverview = localGossip.overview + val localUnreachableMembers = localOverview.unreachable + + // remove the node from the 'unreachable' set in case it is a DOWN node that is rejoining cluster + val newUnreachableMembers = localUnreachableMembers filterNot { _.address == node } + val newOverview = localOverview copy (unreachable = newUnreachableMembers) val newMembers = localMembers + Member(node, MemberStatus.Joining) // add joining node as Joining - val newGossip = localGossip copy (members = newMembers) + val newGossip = localGossip copy (overview = newOverview, members = newMembers) val versionedGossip = newGossip + vclockNode val seenVersionedGossip = versionedGossip seen remoteAddress @@ -533,40 +567,108 @@ class Node(system: ExtendedActorSystem) extends Extension { /** * State transition to UP. */ - private[cluster] final def up(address: Address) {} + private[cluster] final def up(address: Address) { + // FIXME implement me + } /** * State transition to LEAVING. */ - private[cluster] final def leaving(address: Address) {} + private[cluster] final def leaving(address: Address) { + // FIXME implement me + } /** * State transition to EXITING. */ - private[cluster] final def exiting(address: Address) {} + private[cluster] final def exiting(address: Address) { + // FIXME implement me + } /** * State transition to REMOVED. */ - private[cluster] final def removing(address: Address) {} + private[cluster] final def removing(address: Address) { + // FIXME implement me + } /** - * State transition to DOWN. + * The node to DOWN is removed from the 'members' set and put in the 'unreachable' set (if not alread there) + * and its status is set to DOWN. The node is alo removed from the 'seen' table. + * + * The node will reside as DOWN in the 'unreachable' set until an explicit command JOIN command is sent directly + * to this node and it will then go through the normal JOINING procedure. */ - private[cluster] final def downing(address: Address) {} + @tailrec + final private[cluster] def downing(address: Address) { + val localState = state.get + val localGossip = localState.latestGossip + val localMembers = localGossip.members + val localOverview = localGossip.overview + val localSeen = localOverview.seen + val localUnreachableMembers = localOverview.unreachable + + // 1. check if the node to DOWN is in the 'members' set + var downedMember: Option[Member] = None + val newMembers = + localMembers + .map { member ⇒ + if (member.address == address) { + log.info("Node [{}] - Marking node [{}] as DOWN", remoteAddress, member.address) + val newMember = member copy (status = MemberStatus.Down) + downedMember = Some(newMember) + newMember + } else member + } + .filter(_.status != MemberStatus.Down) + + // 2. check if the node to DOWN is in the 'unreachable' set + val newUnreachableMembers = + localUnreachableMembers + .filter(_.status != MemberStatus.Down) // no need to DOWN members already DOWN + .map { member ⇒ + if (member.address == address) { + log.info("Node [{}] - Marking unreachable node [{}] as DOWN", remoteAddress, member.address) + member copy (status = MemberStatus.Down) + } else member + } + + // 3. add the newly DOWNED members from the 'members' (in step 1.) to the 'newUnreachableMembers' set. + val newUnreachablePlusNewlyDownedMembers = downedMember match { + case Some(member) ⇒ newUnreachableMembers + member + case None ⇒ newUnreachableMembers + } + + // 4. remove nodes marked as DOWN from the 'seen' table + val newSeen = newUnreachablePlusNewlyDownedMembers.foldLeft(localSeen) { (currentSeen, member) ⇒ + currentSeen - member.address + } + + val newOverview = localOverview copy (seen = newSeen, unreachable = newUnreachablePlusNewlyDownedMembers) // update gossip overview + val newGossip = localGossip copy (overview = newOverview, members = newMembers) // update gossip + val versionedGossip = newGossip + vclockNode + val newState = localState copy (latestGossip = versionedGossip seen remoteAddress) + + if (!state.compareAndSet(localState, newState)) downing(address) // recur if we fail the update + else { + if (convergence(newState.latestGossip).isDefined) { + newState.memberMembershipChangeListeners foreach { _ notify newState.latestGossip.members } + } + } + } /** * Receive new gossip. */ @tailrec - private[cluster] final def receive(sender: Member, remoteGossip: Gossip) { + final private[cluster] def receive(sender: Member, remoteGossip: Gossip) { val localState = state.get val localGossip = localState.latestGossip val winningGossip = if (remoteGossip.version <> localGossip.version) { // concurrent - val mergedGossip = merge(remoteGossip, localGossip) + val mergedGossip = remoteGossip merge localGossip val versionedMergedGossip = mergedGossip + vclockNode log.debug( @@ -609,55 +711,6 @@ class Node(system: ExtendedActorSystem) extends Extension { connection ! command } - /** - * Initates a new round of gossip. - */ - private def gossip() { - val localState = state.get - val localGossip = localState.latestGossip - val localMembers = localGossip.members - - if (!isSingletonCluster(localState)) { // do not gossip if we are a singleton cluster - log.debug("Node [{}] - Initiating new round of gossip", remoteAddress) - - val localGossip = localState.latestGossip - val localMembers = localGossip.members - val localMembersSize = localMembers.size - - val localUnreachableAddresses = localGossip.overview.unreachable - val localUnreachableSize = localUnreachableAddresses.size - - // 1. gossip to alive members - val gossipedToDeputy = gossipToRandomNodeOf(localMembers map { _.address }) - - // 2. gossip to unreachable members - if (localUnreachableSize > 0) { - val probability: Double = localUnreachableSize / (localMembersSize + 1) - if (random.nextDouble() < probability) gossipToRandomNodeOf(localUnreachableAddresses) - } - - // 3. gossip to a deputy nodes for facilitating partition healing - val deputies = deputyNodes - if ((!gossipedToDeputy || localMembersSize < 1) && !deputies.isEmpty) { - if (localMembersSize == 0) gossipToRandomNodeOf(deputies) - else { - val probability = 1.0 / localMembersSize + localUnreachableSize - if (random.nextDouble() <= probability) gossipToRandomNodeOf(deputies) - } - } - } - } - - /** - * Merges two Gossip instances including membership tables, meta-data tables and the VectorClock histories. - */ - private def merge(gossip1: Gossip, gossip2: Gossip): Gossip = { - val mergedVClock = gossip1.version merge gossip2.version - val mergedMembers = gossip1.members union gossip2.members - val mergedMeta = gossip1.meta ++ gossip2.meta - Gossip(gossip2.overview, mergedMembers, mergedMeta, mergedVClock) - } - /** * Switches the member status. * @@ -668,12 +721,15 @@ class Node(system: ExtendedActorSystem) extends Extension { private def switchMemberStatusTo(newStatus: MemberStatus, state: State): State = { log.info("Node [{}] - Switching membership status to [{}]", remoteAddress, newStatus) - val localSelf = state.self + val localSelf = self val localGossip = state.latestGossip val localMembers = localGossip.members + // change my state into a "new" self val newSelf = localSelf copy (status = newStatus) + + // change my state in 'gossip.members' val newMembersSet = localMembers map { member ⇒ if (member.address == remoteAddress) newSelf else member @@ -683,10 +739,11 @@ class Node(system: ExtendedActorSystem) extends Extension { val newMembersSortedSet = SortedSet[Member](newMembersSet.toList: _*) val newGossip = localGossip copy (members = newMembersSortedSet) + // version my changes val versionedGossip = newGossip + vclockNode val seenVersionedGossip = versionedGossip seen remoteAddress - state copy (self = newSelf, latestGossip = seenVersionedGossip) + state copy (latestGossip = seenVersionedGossip) } /** @@ -704,49 +761,92 @@ class Node(system: ExtendedActorSystem) extends Extension { * @return 'true' if it gossiped to a "deputy" member. */ private def gossipToRandomNodeOf(addresses: Iterable[Address]): Boolean = { - val peers = addresses filterNot (_ == remoteAddress) // filter out myself - val peer = selectRandomNode(peers) - gossipTo(peer) - deputyNodes exists (peer == _) + if (addresses.isEmpty) false + else { + val peers = addresses filter (_ != remoteAddress) // filter out myself + val peer = selectRandomNode(peers) + gossipTo(peer) + deputyNodes exists (peer == _) + } } /** - * Scrutinizes the cluster; marks members detected by the failure detector as unreachable. + * Initates a new round of gossip. + */ + private def gossip() { + val localState = state.get + val localGossip = localState.latestGossip + val localMembers = localGossip.members + + if (!isSingletonCluster(localState) && isAvailable(localState)) { + // only gossip if we are a non-singleton cluster and available + + log.debug("Node [{}] - Initiating new round of gossip", remoteAddress) + + val localGossip = localState.latestGossip + val localMembers = localGossip.members + val localMembersSize = localMembers.size + + val localUnreachableMembers = localGossip.overview.unreachable + val localUnreachableSize = localUnreachableMembers.size + + // 1. gossip to alive members + val gossipedToDeputy = gossipToRandomNodeOf(localMembers map { _.address }) + + // 2. gossip to unreachable members + if (localUnreachableSize > 0) { + val probability: Double = localUnreachableSize / (localMembersSize + 1) + if (random.nextDouble() < probability) gossipToRandomNodeOf(localUnreachableMembers.map(_.address)) + } + + // 3. gossip to a deputy nodes for facilitating partition healing + val deputies = deputyNodes + if ((!gossipedToDeputy || localMembersSize < 1) && !deputies.isEmpty) { + if (localMembersSize == 0) gossipToRandomNodeOf(deputies) + else { + val probability = 1.0 / localMembersSize + localUnreachableSize + if (random.nextDouble() <= probability) gossipToRandomNodeOf(deputies) + } + } + } + } + + /** + * Reaps the unreachable members (moves them to the 'unreachable' list in the cluster overview) according to the failure detector's verdict. */ @tailrec - final private def scrutinize() { + final private def reapUnreachableMembers() { val localState = state.get - if (!isSingletonCluster(localState)) { // do not scrutinize if we are a singleton cluster + if (!isSingletonCluster(localState) && isAvailable(localState)) { + // only scrutinize if we are a non-singleton cluster and available val localGossip = localState.latestGossip val localOverview = localGossip.overview val localSeen = localOverview.seen val localMembers = localGossip.members - val localUnreachableAddresses = localGossip.overview.unreachable + val localUnreachableMembers = localGossip.overview.unreachable val newlyDetectedUnreachableMembers = localMembers filterNot { member ⇒ failureDetector.isAvailable(member.address) } - val newlyDetectedUnreachableAddresses = newlyDetectedUnreachableMembers map { _.address } - if (!newlyDetectedUnreachableAddresses.isEmpty) { // we have newly detected members marked as unavailable + if (!newlyDetectedUnreachableMembers.isEmpty) { // we have newly detected members marked as unavailable val newMembers = localMembers diff newlyDetectedUnreachableMembers - val newUnreachableAddresses: Set[Address] = localUnreachableAddresses ++ newlyDetectedUnreachableAddresses + val newUnreachableMembers: Set[Member] = localUnreachableMembers ++ newlyDetectedUnreachableMembers - val newSeen = newUnreachableAddresses.foldLeft(localSeen)((currentSeen, address) ⇒ currentSeen - address) - - val newOverview = localOverview copy (seen = newSeen, unreachable = newUnreachableAddresses) + val newOverview = localOverview copy (unreachable = newUnreachableMembers) val newGossip = localGossip copy (overview = newOverview, members = newMembers) + // updating vclock and 'seen' table val versionedGossip = newGossip + vclockNode val seenVersionedGossip = versionedGossip seen remoteAddress val newState = localState copy (latestGossip = seenVersionedGossip) // if we won the race then update else try again - if (!state.compareAndSet(localState, newState)) scrutinize() // recur + if (!state.compareAndSet(localState, newState)) reapUnreachableMembers() // recur else { - log.info("Node [{}] - Marking node(s) an unreachable [{}]", remoteAddress, newlyDetectedUnreachableAddresses.mkString(", ")) + log.info("Node [{}] - Marking node(s) as UNREACHABLE [{}]", remoteAddress, newlyDetectedUnreachableMembers.mkString(", ")) if (convergence(newState.latestGossip).isDefined) { newState.memberMembershipChangeListeners foreach { _ notify newMembers } @@ -757,38 +857,156 @@ class Node(system: ExtendedActorSystem) extends Extension { } /** - * Checks if we have a cluster convergence. + * Runs periodic leader actions, such as auto-downing unreachable nodes, assigning partitions etc. + */ + @tailrec + final private def leaderActions() { + val localState = state.get + val localGossip = localState.latestGossip + val localMembers = localGossip.members + + val isLeader = !localMembers.isEmpty && (remoteAddress == localMembers.head.address) + + if (isLeader && isAvailable(localState)) { + // only run the leader actions if we are the LEADER and available + + val localOverview = localGossip.overview + val localSeen = localOverview.seen + val localUnreachableMembers = localGossip.overview.unreachable + + // Leader actions are as follows: + // 1. Move JOINING => UP + // 2. Move EXITING => REMOVED + // 3. Move UNREACHABLE => DOWN (auto-downing by leader) + // 4. Updating the vclock version for the changes + // 5. Updating the 'seen' table + + var hasChangedState = false + val newGossip = + + if (convergence(localGossip).isDefined) { + // we have convergence - so we can't have unreachable nodes + + val newMembers = + localMembers map { member ⇒ + // 1. Move JOINING => UP + if (member.status == MemberStatus.Joining) { + log.info("Node [{}] - Leader is moving node [{}] from JOINING to UP", remoteAddress, member.address) + hasChangedState = true + member copy (status = MemberStatus.Up) + } else member + } map { member ⇒ + // 2. Move EXITING => REMOVED + if (member.status == MemberStatus.Exiting) { + log.info("Node [{}] - Leader is moving node [{}] from EXITING to REMOVED", remoteAddress, member.address) + hasChangedState = true + member copy (status = MemberStatus.Removed) + } else member + } + localGossip copy (members = newMembers) // update gossip + + } else if (autoDown) { + // we don't have convergence - so we might have unreachable nodes + // if 'auto-down' is turned on, then try to auto-down any unreachable nodes + + // FIXME Should we let the leader auto-down every run (as it is now) or just every X seconds? So we can wait for user to invoke explicit DOWN. + + // 3. Move UNREACHABLE => DOWN (auto-downing by leader) + val newUnreachableMembers = + localUnreachableMembers + .filter(_.status != MemberStatus.Down) // no need to DOWN members already DOWN + .map { member ⇒ + log.info("Node [{}] - Leader is marking unreachable node [{}] as DOWN", remoteAddress, member.address) + hasChangedState = true + member copy (status = MemberStatus.Down) + } + + // removing nodes marked as DOWN from the 'seen' table + // FIXME this needs to be done if user issues DOWN as well + val newSeen = localUnreachableMembers.foldLeft(localSeen)((currentSeen, member) ⇒ currentSeen - member.address) + + val newOverview = localOverview copy (seen = newSeen, unreachable = newUnreachableMembers) // update gossip overview + localGossip copy (overview = newOverview) // update gossip + + } else localGossip + + if (hasChangedState) { // we have a change of state - version it and try to update + + // 4. Updating the vclock version for the changes + val versionedGossip = newGossip + vclockNode + + // 5. Updating the 'seen' table + val seenVersionedGossip = versionedGossip seen remoteAddress + + val newState = localState copy (latestGossip = seenVersionedGossip) + + // if we won the race then update else try again + if (!state.compareAndSet(localState, newState)) leaderActions() // recur + else { + if (convergence(newState.latestGossip).isDefined) { + newState.memberMembershipChangeListeners map { _ notify newGossip.members } + } + } + } + } + } + + /** + * Checks if we have a cluster convergence. If there are any unreachable nodes then we can't have a convergence - + * waiting for user to act (issuing DOWN) or leader to act (issuing DOWN through auto-down). * * @returns Some(convergedGossip) if convergence have been reached and None if not */ private def convergence(gossip: Gossip): Option[Gossip] = { val overview = gossip.overview - // if (overview.unreachable.isEmpty) { // if there are any unreachable nodes then we can't have a convergence - - // waiting for user to act (issuing DOWN) or leader to act (issuing DOWN through auto-down) - val seen = gossip.overview.seen - val views = seen.values.toSet - if (views.size == 1) { - log.debug("Node [{}] - Cluster convergence reached", remoteAddress) - Some(gossip) + val unreachable = overview.unreachable + + // First check that: + // 1. we don't have any members that are unreachable (unreachable.isEmpty == true), or + // 2. all unreachable members in the set have status DOWN + // Else we can't continue to check for convergence + // When that is done we check that all the entries in the 'seen' table have the same vector clock version + if (unreachable.isEmpty || !unreachable.exists(_.status != MemberStatus.Down)) { + val seen = gossip.overview.seen + val views = Set.empty[VectorClock] ++ seen.values + + if (views.size == 1) { + log.debug("Node [{}] - Cluster convergence reached", remoteAddress) + Some(gossip) + } else None } else None - // } else None + } + + private def isAvailable(state: State): Boolean = !isUnavailable(state) + + private def isUnavailable(state: State): Boolean = { + val localGossip = state.latestGossip + val localOverview = localGossip.overview + val localMembers = localGossip.members + val localUnreachableMembers = localOverview.unreachable + val isUnreachable = localUnreachableMembers exists { _.address == remoteAddress } + val hasUnavailableMemberStatus = localMembers exists { m ⇒ (m == self) && MemberStatus.isUnavailable(m.status) } + isUnreachable || hasUnavailableMemberStatus } /** - * Sets up cluster command connection. - */ - private def clusterCommandConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "cluster" / "commands") - - /** - * Sets up local cluster command connection. + * Looks up and returns the local cluster command connection. */ private def clusterCommandDaemon = system.actorFor(RootActorPath(remoteAddress) / "system" / "cluster" / "commands") /** - * Sets up cluster gossip connection. + * Looks up and returns the remote cluster command connection for the specific address. + */ + private def clusterCommandConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "cluster" / "commands") + + /** + * Looks up and returns the remote cluster gossip connection for the specific address. */ private def clusterGossipConnectionFor(address: Address): ActorRef = system.actorFor(RootActorPath(address) / "system" / "cluster" / "gossip") + /** + * Gets an Iterable with the addresses of a all the 'deputy' nodes - excluding this node if part of the group. + */ private def deputyNodes: Iterable[Address] = state.get.latestGossip.members.toIterable map (_.address) drop 1 take nrOfDeputyNodes filter (_ != remoteAddress) private def selectRandomNode(addresses: Iterable[Address]): Address = addresses.toSeq(random nextInt addresses.size) diff --git a/akka-cluster/src/test/scala/akka/cluster/ClientDowningSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClientDowningSpec.scala new file mode 100644 index 0000000000..16651af9b5 --- /dev/null +++ b/akka-cluster/src/test/scala/akka/cluster/ClientDowningSpec.scala @@ -0,0 +1,186 @@ +/** + * Copyright (C) 2009-2012 Typesafe Inc. + */ + +package akka.cluster + +import akka.testkit._ +import akka.dispatch._ +import akka.actor._ +import akka.remote._ +import akka.util.duration._ + +import com.typesafe.config._ + +import java.net.InetSocketAddress + +class ClientDowningSpec extends AkkaSpec(""" + akka { + loglevel = "INFO" + actor.provider = "akka.remote.RemoteActorRefProvider" + cluster { + failure-detector.threshold = 3 + auto-down = off + } + } + """) with ImplicitSender { + + var node1: Node = _ + var node2: Node = _ + var node3: Node = _ + var node4: Node = _ + + var system1: ActorSystemImpl = _ + var system2: ActorSystemImpl = _ + var system3: ActorSystemImpl = _ + var system4: ActorSystemImpl = _ + + try { + "Client of a 4 node cluster" must { + + // ======= NODE 1 ======== + system1 = ActorSystem("system1", ConfigFactory + .parseString(""" + akka { + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] + node1 = Node(system1) + val fd1 = node1.failureDetector + val address1 = node1.remoteAddress + + // ======= NODE 2 ======== + system2 = ActorSystem("system2", ConfigFactory + .parseString(""" + akka { + remote.netty { + hostname = localhost + port = 5551 + } + cluster.node-to-join = "akka://system1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] + node2 = Node(system2) + val fd2 = node2.failureDetector + val address2 = node2.remoteAddress + + // ======= NODE 3 ======== + system3 = ActorSystem("system3", ConfigFactory + .parseString(""" + akka { + remote.netty { + hostname = localhost + port=5552 + } + cluster.node-to-join = "akka://system1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] + node3 = Node(system3) + val fd3 = node3.failureDetector + val address3 = node3.remoteAddress + + // ======= NODE 4 ======== + system4 = ActorSystem("system4", ConfigFactory + .parseString(""" + akka { + remote.netty { + hostname = localhost + port=5553 + } + cluster.node-to-join = "akka://system1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote4 = system4.provider.asInstanceOf[RemoteActorRefProvider] + node4 = Node(system4) + val fd4 = node4.failureDetector + val address4 = node4.remoteAddress + + "be able to DOWN a node that is UP" taggedAs LongRunningTest in { + + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds + + // check cluster convergence + node1.convergence must be('defined) + node2.convergence must be('defined) + node3.convergence must be('defined) + node4.convergence must be('defined) + + // shut down node3 + node3.shutdown() + system3.shutdown() + + // wait for convergence + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) + + // client marks node3 as DOWN + node1.scheduleNodeDown(address3) + + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds + + // check cluster convergence + node1.convergence must be('defined) + node2.convergence must be('defined) + node4.convergence must be('defined) + + node1.latestGossip.members.size must be(3) + node1.latestGossip.members.exists(_.address == address3) must be(false) + } + + "be able to DOWN a node that is UNREACHABLE" taggedAs LongRunningTest in { + + // shut down system1 - the leader + node4.shutdown() + system4.shutdown() + + // wait for convergence + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) + + // clien marks node4 as DOWN + node2.scheduleNodeDown(address4) + + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds + + // check cluster convergence + node1.convergence must be('defined) + node2.convergence must be('defined) + + node1.latestGossip.members.size must be(2) + node1.latestGossip.members.exists(_.address == address4) must be(false) + node1.latestGossip.members.exists(_.address == address3) must be(false) + } + } + } catch { + case e: Exception ⇒ + e.printStackTrace + fail(e.toString) + } + + override def atTermination() { + if (node1 ne null) node1.shutdown() + if (system1 ne null) system1.shutdown() + + if (node2 ne null) node2.shutdown() + if (system2 ne null) system2.shutdown() + + if (node3 ne null) node3.shutdown() + if (system3 ne null) system3.shutdown() + + if (node4 ne null) node4.shutdown() + if (system4 ne null) system4.shutdown() + } +} diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala deleted file mode 100644 index 2aeeb1835b..0000000000 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterCommandDaemonFSMSpec.scala +++ /dev/null @@ -1,148 +0,0 @@ -/** - * Copyright (C) 2009-2012 Typesafe Inc. - */ - -package akka.cluster - -import akka.testkit._ -import akka.actor.Address - -class ClusterCommandDaemonFSMSpec - extends AkkaSpec("akka.actor.provider = akka.remote.RemoteActorRefProvider") - with ImplicitSender { - - "A ClusterCommandDaemon FSM" must { - val address = Address("akka", system.name) - "start in Joining" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - } - "be able to switch from Joining to Up" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up(address) - fsm.stateName must be(MemberStatus.Up) - } - "be able to switch from Joining to Down" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Down(address) - fsm.stateName must be(MemberStatus.Down) - } - "be able to switch from Joining to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Remove(address) - fsm.stateName must be(MemberStatus.Removed) - } - "be able to switch from Up to Down" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up(address) - fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Down(address) - fsm.stateName must be(MemberStatus.Down) - } - "be able to switch from Up to Leaving" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up(address) - fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Leave(address) - fsm.stateName must be(MemberStatus.Leaving) - } - "be able to switch from Up to Exiting" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up(address) - fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Exit(address) - fsm.stateName must be(MemberStatus.Exiting) - } - "be able to switch from Up to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up(address) - fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Remove(address) - fsm.stateName must be(MemberStatus.Removed) - } - "be able to switch from Leaving to Down" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up(address) - fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Leave(address) - fsm.stateName must be(MemberStatus.Leaving) - fsm ! ClusterAction.Down(address) - fsm.stateName must be(MemberStatus.Down) - } - "be able to switch from Leaving to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up(address) - fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Leave(address) - fsm.stateName must be(MemberStatus.Leaving) - fsm ! ClusterAction.Remove(address) - fsm.stateName must be(MemberStatus.Removed) - } - "be able to switch from Exiting to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up(address) - fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Exit(address) - fsm.stateName must be(MemberStatus.Exiting) - fsm ! ClusterAction.Remove(address) - fsm.stateName must be(MemberStatus.Removed) - } - "be able to switch from Down to Removed" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up(address) - fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Down(address) - fsm.stateName must be(MemberStatus.Down) - fsm ! ClusterAction.Remove(address) - fsm.stateName must be(MemberStatus.Removed) - } - "not be able to switch from Removed to any other state" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up(address) - fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Remove(address) - fsm.stateName must be(MemberStatus.Removed) - fsm ! ClusterAction.Up(address) - fsm.stateName must be(MemberStatus.Removed) - fsm ! ClusterAction.Leave(address) - fsm.stateName must be(MemberStatus.Removed) - fsm ! ClusterAction.Down(address) - fsm.stateName must be(MemberStatus.Removed) - fsm ! ClusterAction.Exit(address) - fsm.stateName must be(MemberStatus.Removed) - fsm ! ClusterAction.Remove(address) - fsm.stateName must be(MemberStatus.Removed) - } - - "remain in the same state when receiving a Join command" in { - val fsm = TestFSMRef(new ClusterCommandDaemon) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Join(address) - fsm.stateName must be(MemberStatus.Joining) - fsm ! ClusterAction.Up(address) - fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Join(address) - fsm.stateName must be(MemberStatus.Up) - fsm ! ClusterAction.Leave(address) - fsm.stateName must be(MemberStatus.Leaving) - fsm ! ClusterAction.Join(address) - fsm.stateName must be(MemberStatus.Leaving) - fsm ! ClusterAction.Down(address) - fsm.stateName must be(MemberStatus.Down) - fsm ! ClusterAction.Join(address) - fsm.stateName must be(MemberStatus.Down) - } - } -} diff --git a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala index 6668044f33..c8fd8e6bda 100644 --- a/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/ClusterConfigSpec.scala @@ -30,6 +30,7 @@ class ClusterConfigSpec extends AkkaSpec( GossipFrequency must be(1 second) NrOfGossipDaemons must be(4) NrOfDeputyNodes must be(3) + AutoDown must be(true) } } } diff --git a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala index 6c81f8680a..fb9b8408db 100644 --- a/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/GossipingAccrualFailureDetectorSpec.scala @@ -43,7 +43,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] node1 = Node(system1) val fd1 = node1.failureDetector - val address1 = node1.self.address + val address1 = node1.remoteAddress // ======= NODE 2 ======== system2 = ActorSystem("system2", ConfigFactory @@ -57,7 +57,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] node2 = Node(system2) val fd2 = node2.failureDetector - val address2 = node2.self.address + val address2 = node2.remoteAddress // ======= NODE 3 ======== system3 = ActorSystem("system3", ConfigFactory @@ -71,7 +71,7 @@ class GossipingAccrualFailureDetectorSpec extends AkkaSpec(""" val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] node3 = Node(system3) val fd3 = node3.failureDetector - val address3 = node3.self.address + val address3 = node3.remoteAddress "receive gossip heartbeats so that all healthy systems in the cluster are marked 'available'" taggedAs LongRunningTest in { println("Let the systems gossip for a while...") diff --git a/akka-cluster/src/test/scala/akka/cluster/LeaderDowningSpec.scala b/akka-cluster/src/test/scala/akka/cluster/LeaderDowningSpec.scala new file mode 100644 index 0000000000..957f8ed4aa --- /dev/null +++ b/akka-cluster/src/test/scala/akka/cluster/LeaderDowningSpec.scala @@ -0,0 +1,179 @@ +/** + * Copyright (C) 2009-2012 Typesafe Inc. + */ + +package akka.cluster + +import akka.testkit._ +import akka.dispatch._ +import akka.actor._ +import akka.remote._ +import akka.util.duration._ + +import com.typesafe.config._ + +import java.net.InetSocketAddress + +class LeaderDowningSpec extends AkkaSpec(""" + akka { + loglevel = "INFO" + actor.provider = "akka.remote.RemoteActorRefProvider" + cluster { + failure-detector.threshold = 3 + auto-down = on + } + } + """) with ImplicitSender { + + var node1: Node = _ + var node2: Node = _ + var node3: Node = _ + var node4: Node = _ + + var system1: ActorSystemImpl = _ + var system2: ActorSystemImpl = _ + var system3: ActorSystemImpl = _ + var system4: ActorSystemImpl = _ + + try { + "The Leader in a 4 node cluster" must { + + // ======= NODE 1 ======== + system1 = ActorSystem("system1", ConfigFactory + .parseString(""" + akka { + remote.netty { + hostname = localhost + port=5550 + } + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] + node1 = Node(system1) + val fd1 = node1.failureDetector + val address1 = node1.remoteAddress + + // ======= NODE 2 ======== + system2 = ActorSystem("system2", ConfigFactory + .parseString(""" + akka { + remote.netty { + hostname = localhost + port = 5551 + } + cluster.node-to-join = "akka://system1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] + node2 = Node(system2) + val fd2 = node2.failureDetector + val address2 = node2.remoteAddress + + // ======= NODE 3 ======== + system3 = ActorSystem("system3", ConfigFactory + .parseString(""" + akka { + remote.netty { + hostname = localhost + port=5552 + } + cluster.node-to-join = "akka://system1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] + node3 = Node(system3) + val fd3 = node3.failureDetector + val address3 = node3.remoteAddress + + // ======= NODE 4 ======== + system4 = ActorSystem("system4", ConfigFactory + .parseString(""" + akka { + remote.netty { + hostname = localhost + port=5553 + } + cluster.node-to-join = "akka://system1@localhost:5550" + }""") + .withFallback(system.settings.config)) + .asInstanceOf[ActorSystemImpl] + val remote4 = system4.provider.asInstanceOf[RemoteActorRefProvider] + node4 = Node(system4) + val fd4 = node4.failureDetector + val address4 = node4.remoteAddress + + "be able to DOWN a (last) node that is UNREACHABLE" taggedAs LongRunningTest in { + + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds + + // check cluster convergence + node1.convergence must be('defined) + node2.convergence must be('defined) + node3.convergence must be('defined) + node4.convergence must be('defined) + + // shut down system4 + node4.shutdown() + system4.shutdown() + + // wait for convergence - e.g. the leader to auto-down the failed node + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds + + // check cluster convergence + node1.convergence must be('defined) + node2.convergence must be('defined) + node3.convergence must be('defined) + + node1.latestGossip.members.size must be(3) + node1.latestGossip.members.exists(_.address == address4) must be(false) + } + + "be able to DOWN a (middle) node that is UNREACHABLE" taggedAs LongRunningTest in { + + // check cluster convergence + node1.convergence must be('defined) + node2.convergence must be('defined) + node3.convergence must be('defined) + + // shut down system4 + node2.shutdown() + system2.shutdown() + + // wait for convergence - e.g. the leader to auto-down the failed node + println("Give the system time to converge...") + Thread.sleep(30.seconds.dilated.toMillis) // let them gossip for 30 seconds + + // check cluster convergence + node1.convergence must be('defined) + node3.convergence must be('defined) + + node1.latestGossip.members.size must be(2) + node1.latestGossip.members.exists(_.address == address4) must be(false) + node1.latestGossip.members.exists(_.address == address2) must be(false) + } + } + } catch { + case e: Exception ⇒ + e.printStackTrace + fail(e.toString) + } + + override def atTermination() { + if (node1 ne null) node1.shutdown() + if (system1 ne null) system1.shutdown() + + if (node2 ne null) node2.shutdown() + if (system2 ne null) system2.shutdown() + + if (node3 ne null) node3.shutdown() + if (system3 ne null) system3.shutdown() + + if (node4 ne null) node4.shutdown() + if (system4 ne null) system4.shutdown() + } +} diff --git a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala index d27611cb79..08d4201bd3 100644 --- a/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/LeaderElectionSpec.scala @@ -18,8 +18,6 @@ class LeaderElectionSpec extends AkkaSpec(""" akka { loglevel = "INFO" actor.provider = "akka.remote.RemoteActorRefProvider" - actor.debug.lifecycle = on - actor.debug.autoreceive = on cluster.failure-detector.threshold = 3 } """) with ImplicitSender { @@ -49,7 +47,7 @@ class LeaderElectionSpec extends AkkaSpec(""" val remote1 = system1.provider.asInstanceOf[RemoteActorRefProvider] node1 = Node(system1) val fd1 = node1.failureDetector - val address1 = node1.self.address + val address1 = node1.remoteAddress // ======= NODE 2 ======== system2 = ActorSystem("system2", ConfigFactory @@ -66,7 +64,7 @@ class LeaderElectionSpec extends AkkaSpec(""" val remote2 = system2.provider.asInstanceOf[RemoteActorRefProvider] node2 = Node(system2) val fd2 = node2.failureDetector - val address2 = node2.self.address + val address2 = node2.remoteAddress // ======= NODE 3 ======== system3 = ActorSystem("system3", ConfigFactory @@ -83,7 +81,7 @@ class LeaderElectionSpec extends AkkaSpec(""" val remote3 = system3.provider.asInstanceOf[RemoteActorRefProvider] node3 = Node(system3) val fd3 = node3.failureDetector - val address3 = node3.self.address + val address3 = node3.remoteAddress "be able to 'elect' a single leader" taggedAs LongRunningTest in { @@ -107,6 +105,9 @@ class LeaderElectionSpec extends AkkaSpec(""" node1.shutdown() system1.shutdown() + // user marks node1 as DOWN + node2.scheduleNodeDown(address1) + println("Give the system time to converge...") Thread.sleep(30.seconds.dilated.toMillis) // give them 30 seconds to detect failure of system3 @@ -125,6 +126,9 @@ class LeaderElectionSpec extends AkkaSpec(""" node2.shutdown() system2.shutdown() + // user marks node2 as DOWN + node3.scheduleNodeDown(address2) + println("Give the system time to converge...") Thread.sleep(30.seconds.dilated.toMillis) // give them 30 seconds to detect failure of system3 diff --git a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala index d43841d2ca..f3f34e19c1 100644 --- a/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/MembershipChangeListenerSpec.scala @@ -106,9 +106,9 @@ class MembershipChangeListenerSpec extends AkkaSpec(""" } }) - latch.await(10.seconds.dilated.toMillis, TimeUnit.MILLISECONDS) + latch.await(30.seconds.dilated.toMillis, TimeUnit.MILLISECONDS) - Thread.sleep(10.seconds.dilated.toMillis) + Thread.sleep(30.seconds.dilated.toMillis) // check cluster convergence node0.convergence must be('defined) diff --git a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala index 42ead86dfd..fd3e31e83e 100644 --- a/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala +++ b/akka-cluster/src/test/scala/akka/cluster/NodeMembershipSpec.scala @@ -62,19 +62,19 @@ class NodeMembershipSpec extends AkkaSpec(""" val members0 = node0.latestGossip.members.toArray members0.size must be(2) members0(0).address.port.get must be(5550) - members0(0).status must be(MemberStatus.Joining) + members0(0).status must be(MemberStatus.Up) members0(1).address.port.get must be(5551) - members0(1).status must be(MemberStatus.Joining) + members0(1).status must be(MemberStatus.Up) val members1 = node1.latestGossip.members.toArray members1.size must be(2) members1(0).address.port.get must be(5550) - members1(0).status must be(MemberStatus.Joining) + members1(0).status must be(MemberStatus.Up) members1(1).address.port.get must be(5551) - members1(1).status must be(MemberStatus.Joining) + members1(1).status must be(MemberStatus.Up) } - "(when three systems) start gossiping to each other so that both systems gets the same gossip info" taggedAs LongRunningTest in { + "(when three systems) start gossiping to each other so that both systems gets the same gossip info" taggedAs LongRunningTest ignore { // ======= NODE 2 ======== system2 = ActorSystem("system2", ConfigFactory @@ -99,29 +99,29 @@ class NodeMembershipSpec extends AkkaSpec(""" val version = node0.latestGossip.version members0.size must be(3) members0(0).address.port.get must be(5550) - members0(0).status must be(MemberStatus.Joining) + members0(0).status must be(MemberStatus.Up) members0(1).address.port.get must be(5551) - members0(1).status must be(MemberStatus.Joining) + members0(1).status must be(MemberStatus.Up) members0(2).address.port.get must be(5552) - members0(2).status must be(MemberStatus.Joining) + members0(2).status must be(MemberStatus.Up) val members1 = node1.latestGossip.members.toArray members1.size must be(3) members1(0).address.port.get must be(5550) - members1(0).status must be(MemberStatus.Joining) + members1(0).status must be(MemberStatus.Up) members1(1).address.port.get must be(5551) - members1(1).status must be(MemberStatus.Joining) + members1(1).status must be(MemberStatus.Up) members1(2).address.port.get must be(5552) - members1(2).status must be(MemberStatus.Joining) + members1(2).status must be(MemberStatus.Up) val members2 = node2.latestGossip.members.toArray members2.size must be(3) members2(0).address.port.get must be(5550) - members2(0).status must be(MemberStatus.Joining) + members2(0).status must be(MemberStatus.Up) members2(1).address.port.get must be(5551) - members2(1).status must be(MemberStatus.Joining) + members2(1).status must be(MemberStatus.Up) members2(2).address.port.get must be(5552) - members2(2).status must be(MemberStatus.Joining) + members2(2).status must be(MemberStatus.Up) } } } catch { From fbce64cb763602d26af05136c905338c947c26c8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Fri, 9 Mar 2012 16:51:30 +0100 Subject: [PATCH 70/72] minor edit --- akka-cluster/src/main/scala/akka/cluster/Node.scala | 1 - 1 file changed, 1 deletion(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index 03c9c90515..5d00db61e9 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -149,7 +149,6 @@ object MemberStatus { case object Removed extends MemberStatus def isUnavailable(status: MemberStatus): Boolean = { - // status == MemberStatus.Joining || status == MemberStatus.Down || status == MemberStatus.Exiting || status == MemberStatus.Removed || From cf3fa9fa3ce9e9312db0922370c93ce1af9db7c8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Mon, 12 Mar 2012 19:22:02 +0100 Subject: [PATCH 71/72] Moved FIXMEs into tickets. Hardened convergence. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Jonas Bonér --- .../src/main/scala/akka/cluster/Node.scala | 28 +++++++------------ 1 file changed, 10 insertions(+), 18 deletions(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index 5d00db61e9..8286e4e1d8 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -29,20 +29,16 @@ import com.google.protobuf.ByteString * Interface for membership change listener. */ trait MembershipChangeListener { - // FIXME bad for Java - convert to Array? def notify(members: SortedSet[Member]): Unit } /** * Interface for meta data change listener. */ -trait MetaDataChangeListener { // FIXME add management and notification for MetaDataChangeListener - // FIXME bad for Java - convert to what? +trait MetaDataChangeListener { def notify(meta: Map[String, Array[Byte]]): Unit } -// FIXME create Protobuf messages out of all the Gossip stuff - but wait until the prototol is fully stablized. - /** * Base trait for all cluster messages. All ClusterMessage's are serializable. */ @@ -294,7 +290,7 @@ final class ClusterDaemonSupervisor extends Actor { def receive = Actor.emptyBehavior - override def unhandled(unknown: Any): Unit = log.error("/system/cluster can not respond to messages - received [{}]", unknown) + override def unhandled(unknown: Any): Unit = log.error("[/system/cluster] can not respond to messages - received [{}]", unknown) } /** @@ -376,7 +372,7 @@ class Node(system: ExtendedActorSystem) extends Extension { private val log = Logging(system, "Node") private val random = SecureRandom.getInstance("SHA1PRNG") - log.info("Node [{}] - Starting cluster Node...", remoteAddress) + log.info("Node [{}] - is JOINING cluster...", remoteAddress) // create superisor for daemons under path "/system/cluster" private val clusterDaemons = { @@ -415,7 +411,7 @@ class Node(system: ExtendedActorSystem) extends Extension { leaderActions() } - log.info("Node [{}] - Cluster Node started successfully", remoteAddress) + log.info("Node [{}] - have JOINED cluster successfully", remoteAddress) // ====================================================== // ===================== PUBLIC API ===================== @@ -464,7 +460,6 @@ class Node(system: ExtendedActorSystem) extends Extension { * Shuts down all connections to other members, the cluster daemon and the periodic gossip and cleanup tasks. */ def shutdown() { - // FIXME Cheating for now. Can't just shut down. Node must first gossip an Leave command, wait for Leader to do proper Handoff and then await an Exit command before switching to Removed if (isRunning.compareAndSet(true, false)) { log.info("Node [{}] - Shutting down Node and cluster daemons...", remoteAddress) gossipCanceller.cancel() @@ -534,7 +529,7 @@ class Node(system: ExtendedActorSystem) extends Extension { */ @tailrec private[cluster] final def joining(node: Address) { - log.info("Node [{}] - Node [{}] is joining", remoteAddress, node) + log.info("Node [{}] - Node [{}] is JOINING", remoteAddress, node) val localState = state.get val localGossip = localState.latestGossip @@ -567,28 +562,28 @@ class Node(system: ExtendedActorSystem) extends Extension { * State transition to UP. */ private[cluster] final def up(address: Address) { - // FIXME implement me + log.info("Node [{}] - Marking node [{}] as UP", remoteAddress, address) } /** * State transition to LEAVING. */ private[cluster] final def leaving(address: Address) { - // FIXME implement me + log.info("Node [{}] - Marking node [{}] as LEAVING", remoteAddress, address) } /** * State transition to EXITING. */ private[cluster] final def exiting(address: Address) { - // FIXME implement me + log.info("Node [{}] - Marking node [{}] as EXITING", remoteAddress, address) } /** * State transition to REMOVED. */ private[cluster] final def removing(address: Address) { - // FIXME implement me + log.info("Node [{}] - Marking node [{}] as REMOVED", remoteAddress, address) } /** @@ -908,8 +903,6 @@ class Node(system: ExtendedActorSystem) extends Extension { // we don't have convergence - so we might have unreachable nodes // if 'auto-down' is turned on, then try to auto-down any unreachable nodes - // FIXME Should we let the leader auto-down every run (as it is now) or just every X seconds? So we can wait for user to invoke explicit DOWN. - // 3. Move UNREACHABLE => DOWN (auto-downing by leader) val newUnreachableMembers = localUnreachableMembers @@ -921,7 +914,6 @@ class Node(system: ExtendedActorSystem) extends Extension { } // removing nodes marked as DOWN from the 'seen' table - // FIXME this needs to be done if user issues DOWN as well val newSeen = localUnreachableMembers.foldLeft(localSeen)((currentSeen, member) ⇒ currentSeen - member.address) val newOverview = localOverview copy (seen = newSeen, unreachable = newUnreachableMembers) // update gossip overview @@ -965,7 +957,7 @@ class Node(system: ExtendedActorSystem) extends Extension { // 2. all unreachable members in the set have status DOWN // Else we can't continue to check for convergence // When that is done we check that all the entries in the 'seen' table have the same vector clock version - if (unreachable.isEmpty || !unreachable.exists(_.status != MemberStatus.Down)) { + if (unreachable.isEmpty || !unreachable.exists(m ⇒ m.status != MemberStatus.Down || m.status != MemberStatus.Removed)) { val seen = gossip.overview.seen val views = Set.empty[VectorClock] ++ seen.values From f5da25fab4250b66a963f955ecda8c8691ae8d1c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Bone=CC=81r?= Date: Wed, 14 Mar 2012 14:09:09 +0100 Subject: [PATCH 72/72] fixed misspelling --- akka-cluster/src/main/scala/akka/cluster/Node.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/akka-cluster/src/main/scala/akka/cluster/Node.scala b/akka-cluster/src/main/scala/akka/cluster/Node.scala index 8286e4e1d8..61ad9a10d4 100644 --- a/akka-cluster/src/main/scala/akka/cluster/Node.scala +++ b/akka-cluster/src/main/scala/akka/cluster/Node.scala @@ -411,7 +411,7 @@ class Node(system: ExtendedActorSystem) extends Extension { leaderActions() } - log.info("Node [{}] - have JOINED cluster successfully", remoteAddress) + log.info("Node [{}] - has JOINED cluster successfully", remoteAddress) // ====================================================== // ===================== PUBLIC API =====================