Merge pull request #25405 from chbatey/cluster-sharding-docs

Break up cluster sharding docs into sections
This commit is contained in:
Patrik Nordwall 2018-08-21 20:37:12 +02:00 committed by GitHub
commit 3fcfb8d699
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -132,7 +132,7 @@ tutorial named [Akka Cluster Sharding with Scala!](https://github.com/typesafehu
The `ShardRegion` actor is started on each node in the cluster, or group of nodes The `ShardRegion` actor is started on each node in the cluster, or group of nodes
tagged with a specific role. The `ShardRegion` is created with two application specific tagged with a specific role. The `ShardRegion` is created with two application specific
functions to extract the entity identifier and the shard identifier from incoming messages. functions to extract the entity identifier and the shard identifier from incoming messages.
A shard is a group of entities that will be managed together. For the first message in a A `Shard` is a group of entities that will be managed together. For the first message in a
specific shard the `ShardRegion` requests the location of the shard from a central coordinator, specific shard the `ShardRegion` requests the location of the shard from a central coordinator,
the `ShardCoordinator`. the `ShardCoordinator`.
@ -148,22 +148,38 @@ shard incoming messages for that shard are buffered and later delivered when the
shard home is known. Subsequent messages to the resolved shard can be delivered shard home is known. Subsequent messages to the resolved shard can be delivered
to the target destination immediately without involving the `ShardCoordinator`. to the target destination immediately without involving the `ShardCoordinator`.
Scenario 1: ### Scenarios
1. Incoming message M1 to `ShardRegion` instance R1. Once a `Shard` location is known `ShardRegion`s send messages directly. Here are the
2. M1 is mapped to shard S1. R1 doesn't know about S1, so it asks the coordinator C for the location of S1. scenarios for getting to this state. In the scenarios the following notation is used:
3. C answers that the home of S1 is R1.
4. R1 creates child actor for the entity E1 and sends buffered messages for S1 to E1 child
5. All incoming messages for S1 which arrive at R1 can be handled by R1 without C. It creates entity children as needed, and forwards messages to them.
Scenario 2: * `SC` - ShardCoordinator
* `M#` - Message 1, 2, 3, etc
* `SR#` - ShardRegion 1, 2 3, etc
* `S#` - Shard 1 2 3, etc
* `E#` - Entity 1 2 3, etc. An entity refers to an Actor managed by Cluster Sharding.
1. Incoming message M2 to R1. Where `#` is a number to distinguish between instances as there are multiple in the Cluster.
2. M2 is mapped to S2. R1 doesn't know about S2, so it asks C for the location of S2.
3. C answers that the home of S2 is R2. #### Scenario 1: Message to an unknown shard that belongs to the local ShardRegion
4. R1 sends buffered messages for S2 to R2
5. All incoming messages for S2 which arrive at R1 can be handled by R1 without C. It forwards messages to R2. 1. Incoming message `M1` to `ShardRegion` instance `SR1`.
6. R2 receives message for S2, ask C, which answers that the home of S2 is R2, and we are in Scenario 1 (but for R2). 2. `M1` is mapped to shard `S1`. `SR1` doesn't know about `S1`, so it asks the `SC` for the location of `S1`.
3. `SC` answers that the home of `S1` is `SR1`.
4. `R1` creates child actor for the entity `E1` and sends buffered messages for `S1` to `E1` child
5. All incoming messages for `S1` which arrive at `R1` can be handled by `R1` without `SC`. It creates entity children as needed, and forwards messages to them.
#### Scenario 2: Message to an unknown shard that belongs to a remote ShardRegion
1. Incoming message `M2` to `ShardRegion` instance `SR1`.
2. `M2` is mapped to `S2`. SR1 doesn't know about `S2`, so it asks `SC` for the location of `S2`.
3. `SC` answers that the home of `S2` is `SR2`.
4. `SR1` sends buffered messages for `S2` to `SR2`.
5. All incoming messages for `S2` which arrive at `SR1` can be handled by `SR1` without `SC`. It forwards messages to `SR2`.
6. `SR2` receives message for `S2`, ask `SC`, which answers that the home of `S2` is `SR2`, and we are in Scenario 1 (but for `SR2`).
### Shard location
To make sure that at most one instance of a specific entity actor is running somewhere To make sure that at most one instance of a specific entity actor is running somewhere
in the cluster it is important that all nodes have the same view of where the shards in the cluster it is important that all nodes have the same view of where the shards
@ -177,6 +193,8 @@ allocation strategy. The default implementation `ShardCoordinator.LeastShardAllo
allocates new shards to the `ShardRegion` with least number of previously allocated shards. allocates new shards to the `ShardRegion` with least number of previously allocated shards.
This strategy can be replaced by an application specific implementation. This strategy can be replaced by an application specific implementation.
### Shard Rebalancing
To be able to use newly added members in the cluster the coordinator facilitates rebalancing To be able to use newly added members in the cluster the coordinator facilitates rebalancing
of shards, i.e. migrate entities from one node to another. In the rebalance process the of shards, i.e. migrate entities from one node to another. In the rebalance process the
coordinator first notifies all `ShardRegion` actors that a handoff for a shard has started. coordinator first notifies all `ShardRegion` actors that a handoff for a shard has started.
@ -202,6 +220,8 @@ i.e. new members in the cluster. There is a configurable threshold of how large
must be to begin the rebalancing. This strategy can be replaced by an application specific must be to begin the rebalancing. This strategy can be replaced by an application specific
implementation. implementation.
### Shard Coordinator State
The state of shard locations in the `ShardCoordinator` is persistent (durable) with The state of shard locations in the `ShardCoordinator` is persistent (durable) with
@ref:[Distributed Data](distributed-data.md) or @ref:[Persistence](persistence.md) to survive failures. When a crashed or @ref:[Distributed Data](distributed-data.md) or @ref:[Persistence](persistence.md) to survive failures. When a crashed or
unreachable coordinator node has been removed (via down) from the cluster a new `ShardCoordinator` singleton unreachable coordinator node has been removed (via down) from the cluster a new `ShardCoordinator` singleton
@ -209,16 +229,21 @@ actor will take over and the state is recovered. During such a failure period sh
with known location are still available, while messages for new (unknown) shards with known location are still available, while messages for new (unknown) shards
are buffered until the new `ShardCoordinator` becomes available. are buffered until the new `ShardCoordinator` becomes available.
### Message ordering
As long as a sender uses the same `ShardRegion` actor to deliver messages to an entity As long as a sender uses the same `ShardRegion` actor to deliver messages to an entity
actor the order of the messages is preserved. As long as the buffer limit is not reached actor the order of the messages is preserved. As long as the buffer limit is not reached
messages are delivered on a best effort basis, with at-most once delivery semantics, messages are delivered on a best effort basis, with at-most once delivery semantics,
in the same way as ordinary message sending. Reliable end-to-end messaging, with in the same way as ordinary message sending. Reliable end-to-end messaging, with
at-least-once semantics can be added by using `AtLeastOnceDelivery` in @ref:[Persistence](persistence.md). at-least-once semantics can be added by using `AtLeastOnceDelivery` in @ref:[Persistence](persistence.md).
### Overhead
Some additional latency is introduced for messages targeted to new or previously Some additional latency is introduced for messages targeted to new or previously
unused shards due to the round-trip to the coordinator. Rebalancing of shards may unused shards due to the round-trip to the coordinator. Rebalancing of shards may
also add latency. This should be considered when designing the application specific also add latency. This should be considered when designing the application specific
shard resolution, e.g. to avoid too fine grained shards. shard resolution, e.g. to avoid too fine grained shards. Once a shard's location is known
the only overhead is sending a message via the `ShardRegion` rather than directly.
<a id="cluster-sharding-mode"></a> <a id="cluster-sharding-mode"></a>
## Distributed Data vs. Persistence Mode ## Distributed Data vs. Persistence Mode