Version

    NodeA Cannot Establish TCP Connection (Port 7800 by Default) to NodeB

    TCP connection is used for asynchronous messaging. When the NodeB can’t send/receive asynchronous messages, the other nodes aren’t notified about started/finished jobs, so a parent jobflow running on NodeA keeps waiting for the event from NodeB. A heart-beat is vital for meaningful load-balancing, the same check-task mentioned above also checks the heart-beat from all Cluster nodes.

    Time-line describing the scenario:

    • 0s - the network connection between NodeA and NodeB is down;

    • 60s - NodeA uses the last available NodeB heart-beat;

    • 0-40s - a check-task running on NodeA detects the missing heart-beat from NodeB;

    • the status of NodeA or NodeB (the one with shorter uptime) is changed to suspended.

    The following configuration properties set the time intervals mentioned above:

    cluster.node.check.checkMinInterval

    The periodicity of Cluster node checks, in milliseconds.

    Default: 40000

    cluster.node.sendinfo.interval

    The periodicity of heart-beat messages, in milliseconds.

    Default: 2000

    cluster.node.sendinfo.min_interval

    A heart-beat may occasionally be sent more often than specified by cluster.node.sendinfo.interval. This property specifies the minimum interval in milliseconds.

    Default: 500

    cluster.node.remove.interval

    The maximum interval for missing a heart-beat, in milliseconds.

    Default: 50000