Kerberos Authentication for Hadoop

For user authentication in Hadoop, CloverDX can use the Kerberos authentication protocol.

To use Kerberos, you have to set up your Java, project and HDFS connection. For more information, see Kerberos requirements and setting.

Note that the following instructions are applicable for Tomcat application server and Unix-like systems.

Java Setting

There are several ways of setting Java for Kerberos. In case of the first two options (configuration via system properties and via configuration file), you must modify both setenv.sh in CloverDX Server and CloverDXDesigner.ini in CloverDX Designer.

Additionally, add the parameters in CloverDX Designer to WindowPreferencesCloverDX RuntimeVM parameters pane.

  • Configuration via system properties

    Set the Java system property java.security.krb5.realm to the name of your Kerberos realm, for example:

    -Djava.security.krb5.realm=EXAMPLE.COM

    Set the Java system property java.security.krb5.kdc to the hostname of your Kerberos key distribution center, for example:

    -Djava.security.krb5.kdc=kerberos.example.com
  • Configuration via config file

    Set the Java system property java.security.krb5.conf to point to the location of your Kerberos configuration file, for example:

    -Djava.security.krb5.conf="/path/to/krb5.conf"
  • Configuration via config file in Java installation directory

    Put the krb5.conf file into the %JAVA_HOME%/lib/security directory, e.g. /opt/jdk1.8.0_144/jre/lib/security/krb5.conf.

    [Note]Note

    If you are using AES256 in Kerberos, install JCE unlimited strength policy files into Java installation: Java 8

    For more information, see the README.txt in the downloaded zip archive.

Project Setting
  • Copy the .keytab file into the project, e.g. conn/clover.keytab.
Connection Setting
[Note]Note

Kerberos authentication requires the hadoop-auth-*.jar library on both HDFS + MapReduce and Hive connection classpath.

  • HDFS and MapReduce Connection

    1. Set Username to the principal name, e.g. clover/clover@EXAMPLE.COM.
    2. Set the following parameters in the Hadoop Parameters pane:

      cloveretl.hadoop.kerberos.keytab=${CONN_DIR}/clover.keytab
      hadoop.security.authentication=Kerberos
      yarn.resourcemanager.principal=yarn/_HOST@EXAMPLE.COM

      Example 33.1. Properties needed to connect to a Hadoop High Availability (HA) cluster in Hadoop connection

      mapreduce.app-submission.cross-platform\=true
      
      yarn.application.classpath\=\:$HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/*, $HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*, $HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*, $HADOOP_MAPRED_HOME/lib/*, $HADOOP_YARN_HOME/*, $HADOOP_YARN_HOME/lib/*\:
      yarn.app.mapreduce.am.resource.mb\=512
      mapreduce.map.memory.mb\=512
      mapreduce.reduce.memory.mb\=512
      mapreduce.framework.name\=yarn
      yarn.log.aggregation-enable\=true
      
      mapreduce.jobhistory.address\=example.com\:port
      
      yarn.resourcemanager.ha.enabled\=true
      yarn.resourcemanager.ha.rm-ids\=rm1,rm2
      yarn.resourcemanager.hostname.rm1\=example.com
      yarn.resourcemanager.hostname.rm2\=example.com
      yarn.resourcemanager.scheduler.address.rm1\=example.com\:port
      yarn.resourcemanager.scheduler.address.rm2\=example.com\:port
      
      fs.permissions.umask-mode\=000
      fs.defaultFS\=hdfs\://nameservice1
      fs.default.name\=hdfs\://nameservice1
      fs.nameservices\=nameservice1
      fs.ha.namenodes.nameservice1\=namenode1,namenode2
      fs.namenode.rpc-address.nameservice1.namenode1\=example.com\:port
      fs.namenode.rpc-address.nameservice1.namenode2\=example.com\:port
      fs.client.failover.proxy.provider.nameservice1\=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
      
      type=HADOOP
      host=nameservice1
      username=clover/clover@EXAMPLE.COM
      hostMapred=Not needed for YARN

      [Tip]Tip

      The _HOST string in yarn/_HOST@EXAMPLE.COM and hive/_HOST@EXAMPLE.COM is a placeholder that will be automatically replaced with an actual hostname. This is the recommended way that will work even with high-availability Hadoop cluster setup.

    3. If you encounter an error:

      No common protection layer between client and server

      set the hadoop.rpc.protection parameter to match your Hadoop cluster configuration.

  • Hive Connection

    1. Add ;principal=hive/_HOST@EXAMPLE.COM to the URL, e.g.

      jdbc:hive2://hive.example.com:10000/default;principal=hive/_HOST@EXAMPLE.COM

    2. Set User to the principal name, e.g. clover/clover@EXAMPLE.COM
    3. Set cloveretl.hadoop.kerberos.keytab=${CONN_DIR}/clover.keytab in Advanced JDBC properties.