Version

    NodeA Cannot Establish TCP Connection (Port 7800 by Default) to NodeB

    TCP connection is used for asynchronous messaging. When the NodeB can't send/receive asynchronous messages, the other nodes aren't notified about started/finished jobs, so a parent jobflow running on NodeA keeps waiting for the event from NodeB. A heart-beat is vital for meaningful load-balancing, the same check-task mentioned above also checks a heart-beat from all Cluster nodes.

    Time-line describing the scenario:
    • 0s network connection between NodeA and NodeB is down

    • 60s NodeA uses the last available NodeB heart-beat

    • 0-40s check-task running on NodeA detects missing heart-beat from NodeB

    • status of NodeA or NodeB (the one with shorter uptime) is changed to suspended

    The following configuration properties set the time intervals mentioned above:
    cluster.node.check.checkMinInterval

    Periodicity of Cluster node checks, in milliseconds.

    Default: 40000

    cluster.node.sendinfo.interval

    Periodicity of heart-beat messages, in milliseconds.

    Default: 2000

    cluster.node.sendinfo.min_interval

    A heart-beat may occasionally be sent more often than specified by cluster.node.sendinfo.interval. This property specifies the minimum interval in milliseconds.

    Default: 500

    cluster.node.remove.interval

    The maximum interval for missing a heart-beat, in milliseconds.

    Default: 50000