[resolved] tImpalaInput - java.lang.ClassNotFoundException: org.apache.hadoop.hiv

One Star

[resolved] tImpalaInput - java.lang.ClassNotFoundException: org.apache.hadoop.hiv

I'm using CDH 5.2.0 with Impala 2.0.0+cdh5.2.0+0 and Hive 0.13.1+cdh5.2.0+221.  I'm able to successfully run this query on this Impala cluster using Hue but unable to do so using Talend Open Studio for Big Data 5.6.0.20141024_1545 - I am using the tImpalaInput component to run the query and my cluster does have Kerberos enabled:
Query:  select code, sum(salary) as salarysum from sample_07 group by code order by code;
Error from Talend:
Starting job TOS_ImpalaTesting at 09:53 18/12/2014.
connecting to socket on port 3993
connected
: org.apache.hadoop.util.Shell - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:324)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:339)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:332)
at org.apache.hadoop.hive.conf.HiveConf$ConfVars.findHadoopBinary(HiveConf.java:918)
at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:228)
at org.apache.hive.jdbc.HiveConnection.isHttpTransportMode(HiveConnection.java:304)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:181)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:164)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at hadooptesting.tos_impalatesting_0_1.TOS_ImpalaTesting.tImpalaConnection_1Process(TOS_ImpalaTesting.java:354)
at hadooptesting.tos_impalatesting_0_1.TOS_ImpalaTesting.runJobInTOS(TOS_ImpalaTesting.java:1047)
at hadooptesting.tos_impalatesting_0_1.TOS_ImpalaTesting.main(TOS_ImpalaTesting.java:904)
disconnected
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/shims/ShimLoader
at org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:68)
at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:250)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:181)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:164)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at hadooptesting.tos_impalatesting_0_1.TOS_ImpalaTesting.tImpalaConnection_1Process(TOS_ImpalaTesting.java:354)
at hadooptesting.tos_impalatesting_0_1.TOS_ImpalaTesting.runJobInTOS(TOS_ImpalaTesting.java:1047)
at hadooptesting.tos_impalatesting_0_1.TOS_ImpalaTesting.main(TOS_ImpalaTesting.java:904)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.shims.ShimLoader
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 10 more
Job TOS_ImpalaTesting ended at 09:53 18/12/2014.

I did find this JIRA that mentions this (or a similar issue) is fixed in Hive 0.14 (which was recently released).
Any help would be appreciated.  Screenshots of my components and process attached below.  Thank you.


Accepted Solutions
Seven Stars

Re: [resolved] tImpalaInput - java.lang.ClassNotFoundException: org.apache.hadoop.hiv

The first error is not really an error, it happens all over the place when running Hadoop on Windows, and is an upstream Hadoop issue. The second issue is because you are using CDH5.2 (Impala 2.0) which is not currently supported by the Talend components. Hadoop/Cloudera/Horton are all super picky about the libs and versions being used. They need to be correct and match the cluster versions. In order to connect to Impala 2.0 on CDH5.2 you will need to use the hive-jdbc-0.13.0.jar or the Cloudera one, neither of which  is included in the components in Talend 5.6 (it also does not appear to include the hive-exec dependency which is a bug in the component but wouldn't save you Smiley Happy). You can either use a version of CDH that is supported (5.1) or update the components yourself to include the correct libs (hive-jdbc-0.13.x.jar and hive-exec-0.13.x.jar) Welcome to the Hadoop arms race. Smiley Happy
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_jdbc....

All Replies
Seven Stars

Re: [resolved] tImpalaInput - java.lang.ClassNotFoundException: org.apache.hadoop.hiv

The first error is not really an error, it happens all over the place when running Hadoop on Windows, and is an upstream Hadoop issue. The second issue is because you are using CDH5.2 (Impala 2.0) which is not currently supported by the Talend components. Hadoop/Cloudera/Horton are all super picky about the libs and versions being used. They need to be correct and match the cluster versions. In order to connect to Impala 2.0 on CDH5.2 you will need to use the hive-jdbc-0.13.0.jar or the Cloudera one, neither of which  is included in the components in Talend 5.6 (it also does not appear to include the hive-exec dependency which is a bug in the component but wouldn't save you Smiley Happy). You can either use a version of CDH that is supported (5.1) or update the components yourself to include the correct libs (hive-jdbc-0.13.x.jar and hive-exec-0.13.x.jar) Welcome to the Hadoop arms race. Smiley Happy
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_jdbc....
One Star

Re: [resolved] tImpalaInput - java.lang.ClassNotFoundException: org.apache.hadoop.hiv

jholman - Thank you for the input.  I thought that might be the case based on the error and the Hive JIRA I found. I also replicated the same functionality with the same setup using tHive components and did not run into any issues.
I appreciate your help!

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now