57. Troubleshooting
- NodeA Cannot Establish HTTP Connection to NodeB
- NodeA Cannot Establish TCP Connection (Port 7800 by Default) to NodeB
- NodeB is Killed or It Cannot Connect to the Database
- Node cannot access the sandboxes home directory
- Auto-Resuming in Unreliable Network
- Long-Term Network Malfunction May Cause Jobs to Hang on
Cluster Reliability in Unreliable Network Environment
CloverDX Server instances must cooperate with each other to form a Cluster together. If the connection between nodes doesn’t work at all, or if it is not configured, the Cluster can’t work properly. This chapter describes Cluster nodes behavior in an environment where the connection between nodes is somehow unreliable.
Nodes use three channels to exchange status info or data
-
synchronous calls (via HTTP/HTTPS)
Typically NodeA requests some operation on NodeB, e.g. job execution. HTTP/HTTPS is also used for streaming data between workers of parallel execution
-
asynchronous messaging (TCP connection on port 7800 by default)
Typically heart-beat or events, e.g. job started or finished.
-
shared database – each node must be able to create DB connection
Shared configuration data, execution history, etc.