Version

    Auto-Resuming in Unreliable Network

    In version 4.4, auto-resuming of suspended nodes was introduced.

    Time-line describing the scenario:

    • NodeB is suspended after connection loss

    • 0s - NodeA successfully reestablishes the connection to NodeB;

    • 120s - NodeA changes the NodeB status to forced_resume;

    • NodeB attempts to resume itself if the maximum auto-resume count is not reached;

    • If the connection is lost again, the cycle repeats; if the maximum auto-resume count is exceeded, the node will remain suspended until the counter is reset, to prevent suspend-resume cycles.

    • 240m auto-resume counter is reset

    The following configuration properties set the time intervals mentioned above:

    cluster.node.check.intervalBeforeAutoresume

    Time a node has to be accessible to be forcibly resumed, in milliseconds.

    Default: 120000

    cluster.node.check.maxAutoresumeCount

    How many times a node may try to auto-resume itself.

    Default: 3

    cluster.node.check.intervalResetAutoresumeCount

    Time before the auto-resume counter will be reset, in minutes.

    Default: 240