Version

    Kerberos Authentication for Hadoop

    For user authentication in Hadoop, CloverDX can use the Kerberos authentication protocol.

    To use Kerberos, you have to set up your Java, project and HDFS connection. For more information, see Kerberos requirements and setting.

    Note that the following instructions are applicable for Tomcat application server and Unix-like systems.

    Java Setting

    There are several ways of setting Java for Kerberos. In the case of the first two options (configuration via system properties and via configuration file), you must modify both setenv.sh in CloverDX Server and CloverDXDesigner.ini in CloverDX Designer.

    Additionally, add the parameters in CloverDX Designer to Window  Preferences  CloverDX Runtime  VM parameters pane.

    • Configuration via system properties Set the Java system property java.security.krb5.realm to the name of your Kerberos realm, for example:

      -Djava.security.krb5.realm=EXAMPLE.COM

      Set the Java system property java.security.krb5.kdc to the hostname of your Kerberos key distribution center, for example:

      -Djava.security.krb5.kdc=kerberos.example.com
    • Configuration via config file Set the Java system property java.security.krb5.conf to point to the location of your Kerberos configuration file, for example:

      -Djava.security.krb5.conf="/path/to/krb5.conf"
    • Configuration via config file in Java installation directory Put the krb5.conf file into the %JAVA_HOME%/lib/security directory, e.g. /opt/jdk1.8.0_144/jre/lib/security/krb5.conf.

      If you are using AES256 in Kerberos, install JCE unlimited strength policy files into Java installation: Java 8

      For more information, see the README.txt in the downloaded zip archive.

    Project Setting
    • Copy the .keytab file into the project, e.g. conn/clover.keytab.

    Connection Setting

    Kerberos authentication requires the hadoop-auth-*.jar library on both HDFS + MapReduce and Hive connection classpath.

    • HDFS and MapReduce Connection

      1. Set Username to the principal name, e.g. clover/clover@EXAMPLE.COM.

      2. Set the following parameters in the Hadoop Parameters pane:

        cloveretl.hadoop.kerberos.keytab=${CONN_DIR}/clover.keytab
        hadoop.security.authentication=Kerberos
        yarn.resourcemanager.principal=yarn/_HOST@EXAMPLE.COM
        Example 10. Properties needed to connect to a Hadoop High Availability (HA) cluster in Hadoop connection
        mapreduce.app-submission.cross-platform\=true
        
        yarn.application.classpath\=\:$HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/*, $HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*, $HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*, $HADOOP_MAPRED_HOME/lib/*, $HADOOP_YARN_HOME/*, $HADOOP_YARN_HOME/lib/*\:
        yarn.app.mapreduce.am.resource.mb\=512
        mapreduce.map.memory.mb\=512
        mapreduce.reduce.memory.mb\=512
        mapreduce.framework.name\=yarn
        yarn.log.aggregation-enable\=true
        
        mapreduce.jobhistory.address\=example.com\:port
        
        yarn.resourcemanager.ha.enabled\=true
        yarn.resourcemanager.ha.rm-ids\=rm1,rm2
        yarn.resourcemanager.hostname.rm1\=example.com
        yarn.resourcemanager.hostname.rm2\=example.com
        yarn.resourcemanager.scheduler.address.rm1\=example.com\:port
        yarn.resourcemanager.scheduler.address.rm2\=example.com\:port
        
        fs.permissions.umask-mode\=000
        fs.defaultFS\=hdfs\://nameservice1
        fs.default.name\=hdfs\://nameservice1
        fs.nameservices\=nameservice1
        fs.ha.namenodes.nameservice1\=namenode1,namenode2
        fs.namenode.rpc-address.nameservice1.namenode1\=example.com\:port
        fs.namenode.rpc-address.nameservice1.namenode2\=example.com\:port
        fs.client.failover.proxy.provider.nameservice1\=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
        
        type=HADOOP
        host=nameservice1
        username=clover/clover@EXAMPLE.COM
        hostMapred=Not needed for YARN

        The _HOST string in yarn/_HOST@EXAMPLE.COM and hive/_HOST@EXAMPLE.COM is a placeholder that will be automatically replaced with an actual hostname. This is the recommended way that will work even with high-availability Hadoop cluster setup.

      3. If you encounter an error: No common protection layer between client and server set the hadoop.rpc.protection parameter to match your Hadoop cluster configuration.

    • Hive Connection

      1. Add ;principal=hive/_HOST@EXAMPLE.COM to the URL, e.g. jdbc:hive2://hive.example.com:10000/default;principal=hive/_HOST@EXAMPLE.COM

      2. Set User to the principal name, e.g. clover/clover@EXAMPLE.COM

      3. Set cloveretl.hadoop.kerberos.keytab=${CONN_DIR}/clover.keytab in Advanced JDBC properties.