* Rename some base_url values in Paradox.scala * Rename Akka to Pekko in documentation, also adapt/remove small blocks of text that were no longer consistent or adequate * Rename Akka to Pekko in some related files outside the doc directory * Rename akka-bom to pekko-bom * Rename artifacts prefixes from akka to pekko * Add links to Apache documents in licenses.md * Add links to Akka migration guides for earlier versions
15 KiB
Classic Distributed Data
@@includeincludes.md { #actor-api } For the full documentation of this feature and for new projects see @ref:Distributed Data.
Dependency
To use Pekko Distributed Data, you must add the following dependency in your project:
@@dependency[sbt,Maven,Gradle] {
bomGroup=org.apache.pekko bomArtifact=pekko-bom_scala.binary.version bomVersionSymbols=PekkoVersion
symbol1=PekkoVersion
value1="$pekko.version$"
group="org.apache.pekko"
artifact="pekko-distributed-data_$scala.binary.version$"
version=PekkoVersion
}
Introduction
For the full documentation of this feature and for new projects see @ref:Distributed Data - Introduction.
Using the Replicator
The @apidoc[cluster.ddata.Replicator] actor provides the API for interacting with the data.
The Replicator actor must be started on each node in the cluster, or group of nodes tagged
with a specific role. It communicates with other Replicator instances with the same path
(without address) that are running on other nodes . For convenience it can be used with the
@apidoc[cluster.ddata.DistributedData] extension but it can also be started as an ordinary
actor using the @apidocReplicator.props {scala="#props(settings:org.apache.pekko.cluster.ddata.ReplicatorSettings):org.apache.pekko.actor.Props" java="#props(org.apache.pekko.cluster.ddata.ReplicatorSettings)"}. If it is started as an ordinary actor it is important
that it is given the same name, started on same path, on all nodes.
Cluster members with status @ref:WeaklyUp,
will participate in Distributed Data. This means that the data will be replicated to the
WeaklyUp nodes with the background gossip protocol. Note that it
will not participate in any actions where the consistency mode is to read/write from all
nodes or the majority of nodes. The WeaklyUp node is not counted
as part of the cluster. So 3 nodes + 5 WeaklyUp is essentially a
3 node cluster as far as consistent actions are concerned.
Below is an example of an actor that schedules tick messages to itself and for each tick adds or removes elements from a @apidocORSet (observed-remove set). It also subscribes to changes of this.
- Scala
- @@snip DistributedDataDocSpec.scala { #data-bot }
- Java
- @@snip DataBot.java { #data-bot }
Update
For the full documentation of this feature and for new projects see @ref:Distributed Data - Update.
To modify and replicate a data value you send a @apidocReplicator.Update message to the local @apidoc[cluster.ddata.Replicator].
The current data value for the key of the Update is passed as parameter to the @scala[@scaladocmodify]@java[@javadocmodify()]
function of the Update. The function is supposed to return the new value of the data, which
will then be replicated according to the given consistency level.
The modify function is called by the Replicator actor and must therefore be a pure
function that only uses the data parameter and stable fields from enclosing scope. It must
for example not access the sender (@scala[@scaladocsender()]@java[@javadocgetSender()]) reference of an enclosing actor.
Update is intended to only be sent from an actor running in same local @apidoc[actor.ActorSystem]
as the Replicator, because the modify function is typically not serializable.
- Scala
- @@snip DistributedDataDocSpec.scala { #update }
- Java
- @@snip DistributedDataDocTest.java { #update }
As reply of the Update a @apidocReplicator.UpdateSuccess is sent to the sender of the
Update if the value was successfully replicated according to the supplied
@ref:write consistency level within the supplied timeout. Otherwise a @apidocReplicator.UpdateFailure subclass is
sent back. Note that a @apidocReplicator.UpdateTimeout reply does not mean that the update completely failed
or was rolled back. It may still have been replicated to some nodes, and will eventually
be replicated to all nodes with the gossip protocol.
- Scala
- @@snip DistributedDataDocSpec.scala { #update-response1 }
- Java
- @@snip DistributedDataDocTest.java { #update-response1 }
- Scala
- @@snip DistributedDataDocSpec.scala { #update-response2 }
- Java
- @@snip DistributedDataDocTest.java { #update-response2 }
You will always see your own writes. For example if you send two @apidoc[cluster.ddata.Replicator.Update] messages
changing the value of the same key, the modify function of the second message will
see the change that was performed by the first Update message.
It is possible to abort the Update when inspecting the state parameter that is passed in to
the modify function by throwing an exception. That happens before the update is performed and
a @apidocReplicator.ModifyFailure is sent back as reply.
In the Update message you can pass an optional request context, which the @apidoc[cluster.ddata.Replicator]
does not care about, but is included in the reply messages. This is a convenient
way to pass contextual information (e.g. original sender) without having to use ask
or maintain local correlation data structures.
- Scala
- @@snip DistributedDataDocSpec.scala { #update-request-context }
- Java
- @@snip DistributedDataDocTest.java { #update-request-context }
Get
For the full documentation of this feature and for new projects see @ref:Distributed Data - Get.
To retrieve the current value of a data you send @apidocReplicator.Get message to the
Replicator. You supply a consistency level which has the following meaning:
- Scala
- @@snip DistributedDataDocSpec.scala { #get }
- Java
- @@snip DistributedDataDocTest.java { #get }
As reply of the Get a @apidocReplicator.GetSuccess is sent to the sender of the
Get if the value was successfully retrieved according to the supplied @ref:read consistency level within the supplied timeout. Otherwise a @apidocReplicator.GetFailure is sent.
If the key does not exist the reply will be @apidocReplicator.NotFound.
- Scala
- @@snip DistributedDataDocSpec.scala { #get-response1 }
- Java
- @@snip DistributedDataDocTest.java { #get-response1 }
- Scala
- @@snip DistributedDataDocSpec.scala { #get-response2 }
- Java
- @@snip DistributedDataDocTest.java { #get-response2 }
In the @apidoc[cluster.ddata.Replicator.Get] message you can pass an optional request context in the same way as for the @apidoc[cluster.ddata.Replicator.Update] message, described above. For example the original sender can be passed and replied to after receiving and transforming @apidoc[cluster.ddata.Replicator.GetSuccess].
- Scala
- @@snip DistributedDataDocSpec.scala { #get-request-context }
- Java
- @@snip DistributedDataDocTest.java { #get-request-context }
Subscribe
For the full documentation of this feature and for new projects see @ref:Distributed Data - Subscribe.
You may also register interest in change notifications by sending @apidocReplicator.Subscribe
message to the Replicator. It will send @apidocReplicator.Changed messages to the registered
subscriber when the data for the subscribed key is updated. Subscribers will be notified
periodically with the configured notify-subscribers-interval, and it is also possible to
send an explicit Replicator.FlushChanges message to the Replicator to notify the subscribers
immediately.
The subscriber is automatically removed if the subscriber is terminated. A subscriber can also be deregistered with the @apidocReplicator.Unsubscribe message.
- Scala
- @@snip DistributedDataDocSpec.scala { #subscribe }
- Java
- @@snip DistributedDataDocTest.java { #subscribe }
Consistency
For the full documentation of this feature and for new projects see @ref:Distributed Data Consistency.
Here is an example of using @apidoc[cluster.ddata.Replicator.WriteMajority] and @apidoc[cluster.ddata.Replicator.ReadMajority]:
- Scala
- @@snip ShoppingCart.scala { #read-write-majority }
- Java
- @@snip ShoppingCart.java { #read-write-majority }
- Scala
- @@snip ShoppingCart.scala { #get-cart }
- Java
- @@snip ShoppingCart.java { #get-cart }
- Scala
- @@snip ShoppingCart.scala { #add-item }
- Java
- @@snip ShoppingCart.java { #add-item }
In some rare cases, when performing an @apidoc[cluster.ddata.Replicator.Update] it is needed to first try to fetch latest data from
other nodes. That can be done by first sending a @apidoc[cluster.ddata.Replicator.Get] with @apidoc[cluster.ddata.Replicator.ReadMajority] and then continue with
the @apidoc[cluster.ddata.Replicator.Update] when the @apidoc[cluster.ddata.Replicator.GetSuccess], @apidoc[cluster.ddata.Replicator.GetFailure] or @apidoc[cluster.ddata.Replicator.NotFound] reply is received. This might be
needed when you need to base a decision on latest information or when removing entries from an @apidoc[cluster.ddata.ORSet]
or @apidoc[cluster.ddata.ORMap]. If an entry is added to an ORSet or ORMap from one node and removed from another
node the entry will only be removed if the added entry is visible on the node where the removal is
performed (hence the name observed-removed set).
The following example illustrates how to do that:
- Scala
- @@snip ShoppingCart.scala { #remove-item }
- Java
- @@snip ShoppingCart.java { #remove-item }
@@@ warning
Caveat: Even if you use WriteMajority and ReadMajority there is small risk that you may
read stale data if the cluster membership has changed between the Update and the Get.
For example, in cluster of 5 nodes when you Update and that change is written to 3 nodes:
n1, n2, n3. Then 2 more nodes are added and a Get request is reading from 4 nodes, which
happens to be n4, n5, n6, n7, i.e. the value on n1, n2, n3 is not seen in the response of the
Get request.
@@@
Delete
For the full documentation of this feature and for new projects see @ref:Distributed Data - Delete.
- Scala
- @@snip DistributedDataDocSpec.scala { #delete }
- Java
- @@snip DistributedDataDocTest.java { #delete }
@@@ warning
As deleted keys continue to be included in the stored data on each node as well as in gossip messages, a continuous series of updates and deletes of top-level entities will result in growing memory usage until an ActorSystem runs out of memory. To use Pekko Distributed Data where frequent adds and removes are required, you should use a fixed number of top-level data types that support both updates and removals, for example @apidoc[cluster.ddata.ORMap] or @apidoc[cluster.ddata.ORSet].
@@@
Replicated data types
Pekko contains a set of useful replicated data types and it is fully possible to implement custom replicated data types. For the full documentation of this feature and for new projects see @ref:Distributed Data Replicated data types.
Delta-CRDT
For the full documentation of this feature and for new projects see @ref:Distributed Data Delta CRDT.
Custom Data Type
You can implement your own data types. For the full documentation of this feature and for new projects see @ref:Distributed Data custom data type.
Durable Storage
For the full documentation of this feature and for new projects see @ref:Durable Storage.
Limitations
For the full documentation of this feature and for new projects see @ref:Limitations.
Learn More about CRDTs
- Strong Eventual Consistency and Conflict-free Replicated Data Types (video) talk by Mark Shapiro
- A comprehensive study of Convergent and Commutative Replicated Data Types paper by Mark Shapiro et. al.
Configuration
The @apidoc[cluster.ddata.DistributedData] extension can be configured with the following properties:
@@snip reference.conf { #distributed-data }