When running a Talend 6.4.1 Spark 2.1 Job that uses a thHbaseInput component against a Kerberized CDH 5.10 Hadoop cluster, the Job seems to hang. Attempting to run the Job again results in the following error message:
" [2018-01-17 23:00:00,587]----[INFO]-testjob : The Spark job with the id <1> is still in progress... Elapsed time: 00:00:09.660. [2018-01-17 23:00:05,987]----[INFO]-testjob : The Spark job with the id <1> is still in progress... Elapsed time: 00:00:15.060. .. [2018-01-17 23:00:20,549]----[INFO]-testjob : The Spark job with the id <1> is still in progress... Elapsed time: 00:00:29.622. [2018-01-17 23:00:25,982]----[INFO]-testjob : The Spark job with the id <1> is still in progress... Elapsed time: 00:00:35.055. "
Then, the Job fails with the following exceptions:
" Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$1.run(RpcClientImpl.java:686) at java.security.AccessController.doPrivileged(Native Method) ... Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) "
The Job is executed with a built-in Hadoop cluster configuration. So, not all Hadoop cluster properties, that are normally defined in the cluster configuration files, embedded in the hadoop-conf-cluster_name.jar file are considered in the Job. Thus, all the properties have to be defined manually in the Job.
The issue is, that all of the properties are not defined in the Job. In this case, some of the Kerberos properties, to connect to the Hadoop cluster, are not defined in the Job.
Define a Hadoop Cluster Connection where all Hadoop cluster properties are retrieved, for instance by using Cloudera manager for installation, in Talend Studio Repository and use that defined Repository Hadoop Cluster Connection within the Job.