NodeB is Killed or It Cannot Connect to the Database
Access to a database is vital for running jobs, running scheduler and cooperation with other nodes. Touching a database is also used for detection of dead process. When the JVM process of NodeB is killed, it stops touching the database and the other nodes may detect it.
Time-line describing the scenario:
0s-30s last touch on DB
NodeB or its connection to the database is down
90s NodeA sees the last touch
0-40s check-task running on NodeA detects obsolete touch from NodeB
status of NodeB is changed to
stopped
, jobs running on the NodeB aresolved
, which means that their status is changed toUNKNOWN
and the event is dispatched among the Cluster nodes. The job result is considered aserror
.
The following configuration properties set the time intervals mentioned above:
cluster.node.touch.interval
Periodicity of a database touch, in milliseconds.
Default: 20000
cluster.node.touch.forced_stop.interval
An interval when the other nodes accept the last touch, in milliseconds.
Default: 60000
cluster.node.check.checkMinInterval
Periodicity of Cluster node checks, in milliseconds.
Default: 40000
cluster.node.touch.forced_stop.solve_running_jobs.enabled
A boolean value which can switch the
solving
of running jobs mentioned above.