Version

    Jobs Load Balancing Properties

    Properties of load balancing criteria. A load balancer decides which Cluster node executes the graph. It means, that any node may process a request for execution, but a graph may be executed on the same or on different node according to current load of the nodes and according to these properties. Cluster node's load information is sent periodically to all other nodes - this interval is set by the cluster.node.sendinfo.interval property.

    Each node of the Cluster may have different load balancing properties. Any node may process incoming requests for transformation execution and each may apply criteria for load balancing in a different way according to its own configuration.

    These properties aren't vital for Cluster configuration and the default values are sufficient, but if you want to change the load balancing configuration, see the example below the table for more details about the node selection algorithm.

    Table 42.4. Load balancing properties

    NameTypeDefaultDescription
    cluster.lb.memory.weightfloat3

    The memory weight multiplier used in the Cluster load balancing calculation. Determines the importance of a node's free heap memory compared to CPU utilization.

    cluster.lb.memory.exponentfloat3

    Changes the dependency on free heap memory between linear and exponential. Using the default value, it means that the chance for choosing the node for job execution rises exponentially with the amount of the node's free heap memory.

    cluster.lb.memory.limitfloat0.9

    The upper limit of a node's heap memory usage. Nodes exceeding this limit are omitted from the selection during the load balancing process. If there is no node with heap memory usage below this limit, all nodes will be used in the selection. The node with a lower heap memory usage has a higher preference. The property can be set to any value between 0 (0%) and 1 (100%).

    cluster.lb.cpu.weightfloat1

    The CPU weight multiplier used in the Cluster load balancing calculation. Determines the importance of a node's CPU utilization compared to free heap memory.

    cluster.lb.cpu.exponentfloat1

    Changes the dependency on available CPU between linear and exponential. Using the default value, it means that the chance for choosing the node for job execution rises linearly with the node's lower CPU usage.

    cluster.lb.cpu.limitfloat0.9

    The upper limit of a node's CPU usage. Nodes exceeding this limit are omitted from the selection during the load balancing process. If there is no node with CPU usage below this limit, all nodes will be used in the selection. The node with a lower CPU usage has a higher preference. The property can be set to any value between 0 (0%) and 1 (100%).


    2-node Cluster Load Balancing Example

    In the following example, you can see the load balancing algorithm on a 2-node Cluster. Each node sends the information about its current load status (this status is updated at an interval set by cluster.node.sendinfo.interval).

    In this example, the current load status states that:

    node01 has the maximum heap memory set to 4,000 MB; and at the time, the node has 1,000 MB free heap memory with an average CPU usage of 10%.

    node02 also has the maximum heap memory set to 4,000 MB; but at the time, the node has 3,000 MB free heap memory with an average CPU usage of 10%.

    Node selection process:

    1. Computing nodes' metric ratios

      • Heap memory ratios ([selected node's free memory] / [least loaded node's free memory])

        node01: 1000 / 3000 = 0.33

        node02: 3000 / 3000 = 1

      • CPU ratios ([selected node's free CPU average] / [least loaded node's free CPU average])

        node01: 0.9 / 0.9 = 1

        node02: 0.9 / 0.9 = 1

    2. Exponentiation of the ratios

    3. Resolving target job distribution

      • The sum of heap memory ratios: 0.035937 + 1 = 1.035937

        The sum of CPU ratios: 1 + 1 = 2

      • Computing available weighted resources ([cluster.lb.memory.weight] * ([exponentiated memory ratio] / [sum of heap ratios]) + [cluster.lb.cpu.weight] * ([exponentiated CPU ratio] / [sum of CPU ratios])

        node01: 3 * (0.035937/1.035937) + 1*(1/2) = 0.604

        node02: 3 * (1/1,035937) + 1 * (1/2) = 3.396

      • Resolving target job ratios by rescaling weighted resources to sum up to 1.

        node01: 0.604 / 4 = 0.151

        node02: 3.396 / 4 = 0.849

      Resulting target job distributions are:

      node01: 15.1%

      node02: 84.9%

    Therefore, for the duration of cluster.node.sendinfo.interval (by default 2 seconds), the load balancer stores this information and distributes incoming jobs between nodes in an attempt to meet the target job distribution for each node (i.e. out of 100 jobs processed by the load balancer in the last 2 seconds, approximately 15 would be sent to node01 and 85 to node02).

    After this interval, the load status of each node is updated and a new target job distribution is calculated.

    Cluster Load Balancer Debugging

    CloverDX Server can log the Cluster load balancing decisions, so if the load balancer acts unexpectedly (i.e. sends too many jobs to one node), you can enable the logging and debug the load balancer based on the content of the node.log file.

    To enable the logging in log4j2.xml, change the level attribute to "debug" or "trace":

    Note that the logging must be set in the Server Core Log4j 2 configuration file. For more information, see the Main Logs section.

    <Logger name="LoadBalancerLogger" level="trace" additivity="false">
        <AppenderRef ref="nodeStatusAppender" />
    </Logger>

    Below is an example of Cluster load balancing decision from the node.log file.

    2019-06-21 15:11:54,600[1525675507-2344] rJobLoadBalancer TRACE Selecting one node for execution, excluded nodes are: [], viable nodes are: [node2, node3, node1]
    2019-06-21 15:11:54,600[1525675507-2344] rJobLoadBalancer DEBUG NodeId {node3} selected for execution, from nodes:
    NodeProbability#node2 { execProbability: 0,340, freeMemRatio: 0,795, freeCpuRatio: 0,999 } Jobs#node2 { running: 24, recent: 10 }
    NodeProbability#node3 { execProbability: 0,253, freeMemRatio: 0,527, freeCpuRatio: 0,999 } Jobs#node3 { running: 15, recent: 8 } *
    NodeProbability#node1 { execProbability: 0,406, freeMemRatio: 1,000, freeCpuRatio: 1,000 } Jobs#node1 { running: 26, recent: 12 }