Kerberos Authentication for Hadoop

CloverDX Designer > Graphs > Connections > Hadoop Connections > Kerberos Authentication for Hadoop

For user authentication in Hadoop, CloverDX can use the Kerberos authentication protocol.

To use Kerberos, you have to set up your Java, project and HDFS connection. For more information, see Kerberos requirements and setting.

Note that the following instructions are applicable for Tomcat application server and Unix-like systems.

Java Setting

There are several ways of setting Java for Kerberos. In case of the first two options (configuration via system properties and via configuration file), you must modify both setenv.sh in CloverDX Server and CloverDXDesigner.ini in CloverDX Designer.

Additionally, add the parameters in CloverDX Designer to Window → Preferences → CloverDX Runtime → VM parameters pane.

Configuration via system properties
Set the Java system property java.security.krb5.realm to the name of your Kerberos realm, for example:
```
-Djava.security.krb5.realm=EXAMPLE.COM
```
Set the Java system property java.security.krb5.kdc to the hostname of your Kerberos key distribution center, for example:
```
-Djava.security.krb5.kdc=kerberos.example.com
```
Configuration via config file
Set the Java system property java.security.krb5.conf to point to the location of your Kerberos configuration file, for example:
```
-Djava.security.krb5.conf="/path/to/krb5.conf"
```

Configuration via config file in Java installation directory

Put the krb5.conf file into the %JAVA_HOME%/lib/security directory, e.g. /opt/jdk1.8.0_144/jre/lib/security/krb5.conf.

Note

If you are using AES256 in Kerberos, install JCE unlimited strength policy files into Java installation: Java 8

For more information, see the README.txt in the downloaded zip archive.

Project Setting

Copy the .keytab file into the project, e.g. conn/clover.keytab.

Connection Setting

	Note
	Kerberos authentication requires the `hadoop-auth-*.jar` library on both HDFS + MapReduce and Hive connection classpath.

HDFS and MapReduce Connection

Set Username to the principal name, e.g. clover/clover@EXAMPLE.COM.

Set the following parameters in the Hadoop Parameters pane:

cloveretl.hadoop.kerberos.keytab=${CONN_DIR}/clover.keytab
hadoop.security.authentication=Kerberos
yarn.resourcemanager.principal=yarn/_HOST@EXAMPLE.COM

Example 33.1. Properties needed to connect to a Hadoop High Availability (HA) cluster in Hadoop connection

mapreduce.app-submission.cross-platform\=true

yarn.application.classpath\=\:$HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/*, $HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*, $HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*, $HADOOP_MAPRED_HOME/lib/*, $HADOOP_YARN_HOME/*, $HADOOP_YARN_HOME/lib/*\:
yarn.app.mapreduce.am.resource.mb\=512
mapreduce.map.memory.mb\=512
mapreduce.reduce.memory.mb\=512
mapreduce.framework.name\=yarn
yarn.log.aggregation-enable\=true

mapreduce.jobhistory.address\=example.com\:port

yarn.resourcemanager.ha.enabled\=true
yarn.resourcemanager.ha.rm-ids\=rm1,rm2
yarn.resourcemanager.hostname.rm1\=example.com
yarn.resourcemanager.hostname.rm2\=example.com
yarn.resourcemanager.scheduler.address.rm1\=example.com\:port
yarn.resourcemanager.scheduler.address.rm2\=example.com\:port

fs.permissions.umask-mode\=000
fs.defaultFS\=hdfs\://nameservice1
fs.default.name\=hdfs\://nameservice1
fs.nameservices\=nameservice1
fs.ha.namenodes.nameservice1\=namenode1,namenode2
fs.namenode.rpc-address.nameservice1.namenode1\=example.com\:port
fs.namenode.rpc-address.nameservice1.namenode2\=example.com\:port
fs.client.failover.proxy.provider.nameservice1\=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

type=HADOOP
host=nameservice1
username=clover/clover@EXAMPLE.COM
hostMapred=Not needed for YARN

	Tip
	The `_HOST` string in `yarn/_HOST@EXAMPLE.COM` and `hive/_HOST@EXAMPLE.COM` is a placeholder that will be automatically replaced with an actual hostname. This is the recommended way that will work even with high-availability Hadoop cluster setup.

If you encounter an error:
No common protection layer between client and server
set the hadoop.rpc.protection parameter to match your Hadoop cluster configuration.

Hive Connection
1. Add ;principal=hive/_HOST@EXAMPLE.COM to the URL, e.g.
  jdbc:hive2://hive.example.com:10000/default;principal=hive/_HOST@EXAMPLE.COM
2. Set User to the principal name, e.g. clover/clover@EXAMPLE.COM
3. Set cloveretl.hadoop.kerberos.keytab=${CONN_DIR}/clover.keytab in Advanced JDBC properties.