+ akka-cluster-metrics: new akka module
* new akka module split from akka-cluster * provide sigar provisioning * fix ewma usage * resolve #16121 * see #16354
This commit is contained in:
parent
baca3644e2
commit
7b9f77a073
121 changed files with 10462 additions and 215 deletions
155
akka-docs/rst/scala/cluster-metrics.rst
Normal file
155
akka-docs/rst/scala/cluster-metrics.rst
Normal file
|
|
@ -0,0 +1,155 @@
|
|||
|
||||
.. _cluster_metrics_scala:
|
||||
|
||||
Cluster Metrics Extension
|
||||
=========================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
The member nodes of the cluster can collect system health metrics and publish that to other cluster nodes
|
||||
and to the registered subscribers on the system event bus with the help of Cluster Metrics Extension.
|
||||
|
||||
Cluster metrics information is primarily used for load-balancing routers,
|
||||
and can also be used to implement advanced metrics-based node life cycles,
|
||||
such as "Node Let-it-crash" when CPU steal time becomes excessive.
|
||||
|
||||
Cluster Metrics Extension is a separate akka module delivered in ``akka-cluster-metrics`` jar.
|
||||
|
||||
To enable usage of the extension you need to add the following dependency to your project:
|
||||
::
|
||||
|
||||
"com.typesafe.akka" % "akka-cluster-metrics_@binVersion@" % "@version@"
|
||||
|
||||
and add the following configuration stanza to your ``application.conf``
|
||||
::
|
||||
|
||||
akka.extensions = [ "akka.cluster.metrics.ClusterMetricsExtension" ]
|
||||
|
||||
Make sure to disable legacy metrics in akka-cluster: ``akka.cluster.metrics.enabled=off``,
|
||||
since it is still enabled in akka-cluster by default (for compatibility with past releases).
|
||||
|
||||
Metrics Collector
|
||||
-----------------
|
||||
|
||||
Metrics collection is delegated to an implementation of ``akka.cluster.metrics.MetricsCollector``.
|
||||
|
||||
Different collector implementations provide different subsets of metrics published to the cluster.
|
||||
Certain message routing and let-it-crash functions may not work when Sigar is not provisioned.
|
||||
|
||||
Cluster metrics extension comes with two built-in collector implementations:
|
||||
|
||||
#. ``akka.cluster.metrics.SigarMetricsCollector``, which requires Sigar provisioning, and is more rich/precise
|
||||
#. ``akka.cluster.metrics.JmxMetricsCollector``, which is used as fall back, and is less rich/precise
|
||||
|
||||
You can also plug-in your own metrics collector implementation.
|
||||
|
||||
By default, metrics extension will use collector provider fall back and will try to load them in this order:
|
||||
|
||||
#. configured user-provided collector
|
||||
#. built-in ``akka.cluster.metrics.SigarMetricsCollector``
|
||||
#. and finally ``akka.cluster.metrics.JmxMetricsCollector``
|
||||
|
||||
Metrics Events
|
||||
--------------
|
||||
|
||||
Metrics extension periodically publishes current snapshot of the cluster metrics to the node system event bus.
|
||||
|
||||
The publication period is controlled by the ``akka.cluster.metrics.collector.sample-period`` setting.
|
||||
|
||||
The payload of the ``akka.cluster.metris.ClusterMetricsChanged`` event will contain
|
||||
latest metrics of the node as well as other cluster member nodes metrics gossip
|
||||
which was received during the collector sample period.
|
||||
|
||||
You can subscribe your metrics listener actors to these events in order to implement custom node lifecycle
|
||||
::
|
||||
|
||||
ClusterMetricsExtension(system).subscribe(metricsListenerActor)
|
||||
|
||||
Hyperic Sigar Provisioning
|
||||
--------------------------
|
||||
|
||||
Both user-provided and built-in metrics collectors can optionally use `Hyperic Sigar <http://www.hyperic.com/products/sigar>`_
|
||||
for a wider and more accurate range of metrics compared to what can be retrieved from ordinary JMX MBeans.
|
||||
|
||||
Sigar is using a native o/s library, and requires library provisioning, i.e.
|
||||
deployment, extraction and loading of the o/s native library into JVM at runtime.
|
||||
|
||||
User can provision Sigar classes and native library in one of the following ways:
|
||||
|
||||
#. Use `Kamon sigar-loader <https://github.com/kamon-io/sigar-loader>`_ as a project dependency for the user project.
|
||||
Metrics extension will extract and load sigar library on demand with help of Kamon sigar provisioner.
|
||||
#. Use `Kamon sigar-loader <https://github.com/kamon-io/sigar-loader>`_ as java agent: ``java -javaagent:/path/to/sigar-loader.jar``.
|
||||
Kamon sigar loader agent will extract and load sigar library during JVM start.
|
||||
#. Place ``sigar.jar`` on the ``classpath`` and Sigar native library for the o/s on the ``java.library.path``.
|
||||
User is required to manage both project dependency and library deployment manually.
|
||||
|
||||
To enable usage of Sigar you can add the following dependency to the user project
|
||||
::
|
||||
|
||||
"io.kamon" % "sigar-loader" % "@sigarLoaderVersion@"
|
||||
|
||||
You can download Kamon sigar-loader from `Maven Central <http://search.maven.org/#search%7Cga%7C1%7Csigar-loader>`_
|
||||
|
||||
|
||||
Adaptive Load Balancing
|
||||
-----------------------
|
||||
|
||||
The ``AdaptiveLoadBalancingPool`` / ``AdaptiveLoadBalancingGroup`` performs load balancing of messages to cluster nodes based on the cluster metrics data.
|
||||
It uses random selection of routees with probabilities derived from the remaining capacity of the corresponding node.
|
||||
It can be configured to use a specific MetricsSelector to produce the probabilities, a.k.a. weights:
|
||||
|
||||
* ``heap`` / ``HeapMetricsSelector`` - Used and max JVM heap memory. Weights based on remaining heap capacity; (max - used) / max
|
||||
* ``load`` / ``SystemLoadAverageMetricsSelector`` - System load average for the past 1 minute, corresponding value can be found in ``top`` of Linux systems. The system is possibly nearing a bottleneck if the system load average is nearing number of cpus/cores. Weights based on remaining load capacity; 1 - (load / processors)
|
||||
* ``cpu`` / ``CpuMetricsSelector`` - CPU utilization in percentage, sum of User + Sys + Nice + Wait. Weights based on remaining cpu capacity; 1 - utilization
|
||||
* ``mix`` / ``MixMetricsSelector`` - Combines heap, cpu and load. Weights based on mean of remaining capacity of the combined selectors.
|
||||
* Any custom implementation of ``akka.cluster.metrics.MetricsSelector``
|
||||
|
||||
The collected metrics values are smoothed with `exponential weighted moving average <http://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average>`_. In the :ref:`cluster_configuration_scala` you can adjust how quickly past data is decayed compared to new data.
|
||||
|
||||
Let's take a look at this router in action. What can be more demanding than calculating factorials?
|
||||
|
||||
The backend worker that performs the factorial calculation:
|
||||
|
||||
.. includecode:: ../../../akka-samples/akka-sample-cluster-scala/src/main/scala/sample/cluster/factorial/FactorialBackend.scala#backend
|
||||
|
||||
The frontend that receives user jobs and delegates to the backends via the router:
|
||||
|
||||
.. includecode:: ../../../akka-samples/akka-sample-cluster-scala/src/main/scala/sample/cluster/factorial/FactorialFrontend.scala#frontend
|
||||
|
||||
|
||||
As you can see, the router is defined in the same way as other routers, and in this case it is configured as follows:
|
||||
|
||||
.. includecode:: ../../../akka-samples/akka-sample-cluster-scala/src/main/resources/factorial.conf#adaptive-router
|
||||
|
||||
It is only ``router`` type and the ``metrics-selector`` parameter that is specific to this router,
|
||||
other things work in the same way as other routers.
|
||||
|
||||
The same type of router could also have been defined in code:
|
||||
|
||||
.. includecode:: ../../../akka-samples/akka-sample-cluster-scala/src/main/scala/sample/cluster/factorial/Extra.scala#router-lookup-in-code
|
||||
|
||||
.. includecode:: ../../../akka-samples/akka-sample-cluster-scala/src/main/scala/sample/cluster/factorial/Extra.scala#router-deploy-in-code
|
||||
|
||||
The `Typesafe Activator <http://www.typesafe.com/platform/getstarted>`_ tutorial named
|
||||
`Akka Cluster Samples with Scala <http://www.typesafe.com/activator/template/akka-sample-cluster-scala>`_.
|
||||
contains the full source code and instructions of how to run the **Adaptive Load Balancing** sample.
|
||||
|
||||
Subscribe to Metrics Events
|
||||
---------------------------
|
||||
|
||||
It is possible to subscribe to the metrics events directly to implement other functionality.
|
||||
|
||||
.. includecode:: ../../../akka-samples/akka-sample-cluster-scala/src/main/scala/sample/cluster/factorial/MetricsListener.scala#metrics-listener
|
||||
|
||||
Custom Metrics Collector
|
||||
------------------------
|
||||
|
||||
Metrics collection is delegated to the implementation of ``akka.cluster.metrics.MetricsCollector``
|
||||
|
||||
You can plug-in your own metrics collector instead of built-in
|
||||
``akka.cluster.metrics.SigarMetricsCollector`` or ``akka.cluster.metrics.JmxMetricsCollector``.
|
||||
|
||||
Look at those two implementations for inspiration.
|
||||
|
||||
Custom metrics collector implementation class must be specified in the :ref:`cluster_metrics_configuration_scala`.
|
||||
|
|
@ -532,77 +532,9 @@ contains the full source code and instructions of how to run the **Router Exampl
|
|||
Cluster Metrics
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
The member nodes of the cluster collects system health metrics and publishes that to other nodes and to
|
||||
registered subscribers. This information is primarily used for load-balancing routers.
|
||||
The member nodes of the cluster can collect system health metrics and publish that to other cluster nodes
|
||||
and to the registered subscribers on the system event bus with the help of :doc:`cluster-metrics`.
|
||||
|
||||
Hyperic Sigar
|
||||
-------------
|
||||
|
||||
The built-in metrics is gathered from JMX MBeans, and optionally you can use `Hyperic Sigar <http://www.hyperic.com/products/sigar>`_
|
||||
for a wider and more accurate range of metrics compared to what can be retrieved from ordinary MBeans.
|
||||
Sigar is using a native OS library. To enable usage of Sigar you need to add the directory of the native library to
|
||||
``-Djava.libarary.path=<path_of_sigar_libs>`` add the following dependency::
|
||||
|
||||
"org.fusesource" % "sigar" % "@sigarVersion@"
|
||||
|
||||
Download the native Sigar libraries from `Maven Central <http://repo1.maven.org/maven2/org/fusesource/sigar/@sigarVersion@/>`_
|
||||
|
||||
Adaptive Load Balancing
|
||||
-----------------------
|
||||
|
||||
The ``AdaptiveLoadBalancingPool`` / ``AdaptiveLoadBalancingGroup`` performs load balancing of messages to cluster nodes based on the cluster metrics data.
|
||||
It uses random selection of routees with probabilities derived from the remaining capacity of the corresponding node.
|
||||
It can be configured to use a specific MetricsSelector to produce the probabilities, a.k.a. weights:
|
||||
|
||||
* ``heap`` / ``HeapMetricsSelector`` - Used and max JVM heap memory. Weights based on remaining heap capacity; (max - used) / max
|
||||
* ``load`` / ``SystemLoadAverageMetricsSelector`` - System load average for the past 1 minute, corresponding value can be found in ``top`` of Linux systems. The system is possibly nearing a bottleneck if the system load average is nearing number of cpus/cores. Weights based on remaining load capacity; 1 - (load / processors)
|
||||
* ``cpu`` / ``CpuMetricsSelector`` - CPU utilization in percentage, sum of User + Sys + Nice + Wait. Weights based on remaining cpu capacity; 1 - utilization
|
||||
* ``mix`` / ``MixMetricsSelector`` - Combines heap, cpu and load. Weights based on mean of remaining capacity of the combined selectors.
|
||||
* Any custom implementation of ``akka.cluster.routing.MetricsSelector``
|
||||
|
||||
The collected metrics values are smoothed with `exponential weighted moving average <http://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average>`_. In the :ref:`cluster_configuration_scala` you can adjust how quickly past data is decayed compared to new data.
|
||||
|
||||
Let's take a look at this router in action. What can be more demanding than calculating factorials?
|
||||
|
||||
The backend worker that performs the factorial calculation:
|
||||
|
||||
.. includecode:: ../../../akka-samples/akka-sample-cluster-scala/src/main/scala/sample/cluster/factorial/FactorialBackend.scala#backend
|
||||
|
||||
The frontend that receives user jobs and delegates to the backends via the router:
|
||||
|
||||
.. includecode:: ../../../akka-samples/akka-sample-cluster-scala/src/main/scala/sample/cluster/factorial/FactorialFrontend.scala#frontend
|
||||
|
||||
|
||||
As you can see, the router is defined in the same way as other routers, and in this case it is configured as follows:
|
||||
|
||||
.. includecode:: ../../../akka-samples/akka-sample-cluster-scala/src/main/resources/factorial.conf#adaptive-router
|
||||
|
||||
It is only router type ``adaptive`` and the ``metrics-selector`` that is specific to this router, other things work
|
||||
in the same way as other routers.
|
||||
|
||||
The same type of router could also have been defined in code:
|
||||
|
||||
.. includecode:: ../../../akka-samples/akka-sample-cluster-scala/src/main/scala/sample/cluster/factorial/Extra.scala#router-lookup-in-code
|
||||
|
||||
.. includecode:: ../../../akka-samples/akka-sample-cluster-scala/src/main/scala/sample/cluster/factorial/Extra.scala#router-deploy-in-code
|
||||
|
||||
The `Typesafe Activator <http://www.typesafe.com/platform/getstarted>`_ tutorial named
|
||||
`Akka Cluster Samples with Scala <http://www.typesafe.com/activator/template/akka-sample-cluster-scala>`_.
|
||||
contains the full source code and instructions of how to run the **Adaptive Load Balancing** sample.
|
||||
|
||||
Subscribe to Metrics Events
|
||||
---------------------------
|
||||
|
||||
It is possible to subscribe to the metrics events directly to implement other functionality.
|
||||
|
||||
.. includecode:: ../../../akka-samples/akka-sample-cluster-scala/src/main/scala/sample/cluster/factorial/MetricsListener.scala#metrics-listener
|
||||
|
||||
Custom Metrics Collector
|
||||
------------------------
|
||||
|
||||
You can plug-in your own metrics collector instead of
|
||||
``akka.cluster.SigarMetricsCollector`` or ``akka.cluster.JmxMetricsCollector``. Look at those two implementations
|
||||
for inspiration. The implementation class can be defined in the :ref:`cluster_configuration_scala`.
|
||||
|
||||
How to Test
|
||||
^^^^^^^^^^^
|
||||
|
|
|
|||
|
|
@ -291,4 +291,4 @@ trait PersistenceDocSpec {
|
|||
//#view-update
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
|
@ -6,6 +6,7 @@ Networking
|
|||
|
||||
../common/cluster
|
||||
cluster-usage
|
||||
cluster-metrics
|
||||
remoting
|
||||
serialization
|
||||
io
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue