Jobs Load Balancing Properties

CloverDX Server > Cluster > Cluster Configuration > Jobs Load Balancing Properties

Properties of load balancing criteria. A load balancer decides which Cluster node executes the graph. It means, that any node may process a request for execution, but a graph may be executed on the same or on different node according to current load of the nodes and according to these properties. Cluster node's load information is sent periodically to all other nodes - this interval is set by the cluster.node.sendinfo.interval property.

Each node of the Cluster may have different load balancing properties. Any node may process incoming requests for transformation execution and each may apply criteria for load balancing in a different way according to its own configuration.

These properties aren't vital for Cluster configuration and the default values are sufficient, but if you want to change the load balancing configuration, see the example below the table for more details about the node selection algorithm.

Table 42.4. Load balancing properties

Name	Type	Default	Description
cluster.lb.memory.weight	float	3	The memory weight multiplier used in the Cluster load balancing calculation. Determines the importance of a node's free heap memory compared to CPU utilization.
cluster.lb.memory.exponent	float	3	Changes the dependency on free heap memory between linear and exponential. Using the default value, it means that the chance for choosing the node for job execution rises exponentially with the amount of the node's free heap memory.
cluster.lb.memory.limit	float	0.9	The upper limit of a node's heap memory usage. Nodes exceeding this limit are omitted from the selection during the load balancing process. If there is no node with heap memory usage below this limit, all nodes will be used in the selection. The node with a lower heap memory usage has a higher preference. The property can be set to any value between 0 (0%) and 1 (100%).
cluster.lb.cpu.weight	float	1	The CPU weight multiplier used in the Cluster load balancing calculation. Determines the importance of a node's CPU utilization compared to free heap memory.
cluster.lb.cpu.exponent	float	1	Changes the dependency on available CPU between linear and exponential. Using the default value, it means that the chance for choosing the node for job execution rises linearly with the node's lower CPU usage.
cluster.lb.cpu.limit	float	0.9	The upper limit of a node's CPU usage. Nodes exceeding this limit are omitted from the selection during the load balancing process. If there is no node with CPU usage below this limit, all nodes will be used in the selection. The node with a lower CPU usage has a higher preference. The property can be set to any value between 0 (0%) and 1 (100%).

2-node Cluster Load Balancing Example

In the following example, you can see the load balancing algorithm on a 2-node Cluster. Each node sends the information about its current load status (this status is updated at an interval set by cluster.node.sendinfo.interval).

In this example, the current load status states that:

node01 has the maximum heap memory set to 4,000 MB; and at the time, the node has 1,000 MB free heap memory with an average CPU usage of 10%.

node02 also has the maximum heap memory set to 4,000 MB; but at the time, the node has 3,000 MB free heap memory with an average CPU usage of 10%.

Node selection process:

Computing nodes' metric ratios
- Heap memory ratios ([selected node's free memory] / [least loaded node's free memory])
  node01: 1000 / 3000 = 0.33
  node02: 3000 / 3000 = 1
- CPU ratios ([selected node's free CPU average] / [least loaded node's free CPU average])
  node01: 0.9 / 0.9 = 1
  node02: 0.9 / 0.9 = 1
Exponentiation of the ratios
- Heap memory ratio exponentiation ([heap memory ratio] ^ [cluster.lb.memory.exponent])
  node01: 0.33^3 = 0.035937
  node02: 1^3 = 1
- CPU ratio exponentiation ([CPU ratio] ^ [cluster.lb.cpu.exponent])
  node01: 1^1 = 1
  node02: 1^1 = 1
Resolving target job distribution
- The sum of heap memory ratios: 0.035937 + 1 = 1.035937
  The sum of CPU ratios: 1 + 1 = 2
- Computing available weighted resources ([cluster.lb.memory.weight] * ([exponentiated memory ratio] / [sum of heap ratios]) + [cluster.lb.cpu.weight] * ([exponentiated CPU ratio] / [sum of CPU ratios])
  node01: 3 * (0.035937/1.035937) + 1*(1/2) = 0.604
  node02: 3 * (1/1,035937) + 1 * (1/2) = 3.396
- Resolving target job ratios by rescaling weighted resources to sum up to 1.
  node01: 0.604 / 4 = 0.151
  node02: 3.396 / 4 = 0.849
Resulting target job distributions are:
node01: 15.1%
node02: 84.9%

Therefore, for the duration of cluster.node.sendinfo.interval (by default 2 seconds), the load balancer stores this information and distributes incoming jobs between nodes in an attempt to meet the target job distribution for each node (i.e. out of 100 jobs processed by the load balancer in the last 2 seconds, approximately 15 would be sent to node01 and 85 to node02).

After this interval, the load status of each node is updated and a new target job distribution is calculated.

Cluster Load Balancer Debugging

CloverDX Server can log the Cluster load balancing decisions, so if the load balancer acts unexpectedly (i.e. sends too many jobs to one node), you can enable the logging and debug the load balancer based on the content of the node.log file.

To enable the logging in log4j2.xml, change the level attribute to "debug" or "trace":

Note that the logging must be set in the Server Core Log4j 2 configuration file. For more information, see the Main Logs section.

<Logger name="LoadBalancerLogger" level="trace" additivity="false"> <AppenderRef ref="nodeStatusAppender" /> </Logger>

Below is an example of Cluster load balancing decision from the node.log file.

2019-06-21 15:11:54,600[1525675507-2344] rJobLoadBalancer TRACE Selecting one node for execution, excluded nodes are: [], viable nodes are: [node2, node3, node1] 2019-06-21 15:11:54,600[1525675507-2344] rJobLoadBalancer DEBUG NodeId {node3} selected for execution, from nodes: NodeProbability#node2 { execProbability: 0,340, freeMemRatio: 0,795, freeCpuRatio: 0,999 } Jobs#node2 { running: 24, recent: 10 } NodeProbability#node3 { execProbability: 0,253, freeMemRatio: 0,527, freeCpuRatio: 0,999 } Jobs#node3 { running: 15, recent: 8 } * NodeProbability#node1 { execProbability: 0,406, freeMemRatio: 1,000, freeCpuRatio: 1,000 } Jobs#node1 { running: 26, recent: 12 }