Chapter 35. Troubleshooting

Graph hangs and is un-killable

A graph can sometimes hang and be un-killable if some network connection in it hangs. Set a shorter tcp-keepalive so that the connection times out earlier. The default value on Linux is 2 hours (7,200 seconds). You can set it to 10 minutes (600 seconds).

See Using TCP keepalive under Linux.

The file descriptor can be closed manually using gdb. See How to close file descriptor via Linux shell command.

SSL/TLS Issues

SSL-related Failures on WebLogic 12

Certain graphs using SSL-encrypted connections may fail on WebLogic 12 due to damaged library distributed with this application server. The issue can be identified by a SHA-1 digest error in the graph execution stacktrace:

Caused by: Could not convert socket to TLS
    at com.sun.mail.pop3.Protocol.stls(
    at com.sun.mail.pop3.POP3Store.getPort(
    at com.sun.mail.pop3.POP3Store.protocolConnect(
Caused by: java.lang.SecurityException:
    SHA1 digest error for org/bouncycastle/jce/provider/JCEECPublicKey.class

To fix the issue, replace the library [MW_HOME]/oracle_common/modules/bcprov-jdk16-1.45.jar with the one downloaded directly from Bouncy Castle home page. Restart the application server to load the new library.

Graph run in Worker is Slow

It may be caused by slow data storage. Use vmstat, e.g. vmstat 1 30. If you see high values under io/bi or io/bo columns, it might be that case. Another tool to confirm or disconfirm slow data storage as possible cause is iotop.