Using Hive components to connect to SSL-enabled Impala

Overview

This KB article shows how to get your Hive components to be utilized for connecting to Impala that has SSL enabled.

 

Environment

  • Talend Studio 6.4.1
  • Cloudera 5.7.2

 

Configure the Hadoop cluster connection in metadata in Studio

  1. Right-click the Hadoop cluster and click Create Hadoop Cluster.
  2. Select the distribution and version of your Hadoop cluster, then select Retrieve configuration from Ambari or Cloudera.

    ambari.png

     

  3. Enter your Cloudera Manager URL along with your user credentials, then click Next.
  4. Cluster information will be retrieved and populated.

    cluster.png

     

  5. Once the cluster information is populated, click Check Services to ensure that Studio can connect successfully to the cluster.

    checking.png

     

 

Build the Job

According to Cloudera documentation, when configuring Impala to work with JDBC, you can utilize two different options to connect:

  • Cloudera JDBC driver
  • Hive JDBC driver

Based on this information, you can utilize your Hive components that use the Hive JDBC driver to connect to Impala. Start by creating a Job that creates a file in HDFS, loads that file in Impala, and then reads it.

  1. Right-click Job Designs, click Create Standard Job, then give it a name.
  2. In the Designer, add a tPreJob component, then attach your HDFS connection and Hive Connection to it with an On Component Ok between them. You will use this throughout your Job.

    1. For the tHDFSConnection, drag the HDFS connection from the Hadoop cluster connection created above to the Designer, then select to enter a tHDFSConnection component.
    2. Manually add a tHiveConnection and connect it to the tHDFSConnection component using an On Component Ok connection.
  3. Enter your Impala SSL information on the tHiveConnection to establish the connection. If you use the beeline utility that the cluster provides to connect to HiveServer2 using JDBC, and change the information for Impala, you are able to connect using the Hive JDBC driver that it uses with the following information:

    hiveserver2.png

     

  4. Based on the JDBC URL above, you can see that you connected to Impala using the Hive JDBC driver. Here is how to enter the above information in your tHiveConnection component:

    thiveconnection.png

     

  5. As you can see above, the host is the Impala Daemon, the port is the Impala port, and in the Additional JDBC Settings is the information that tells the JDBC connection that you are using SSL, and identifies the certificate it needs to use for establishing the connection.
  6. Your Job should look like this:

    job.png

     

  7. Add a tRowGenerator that will generate 10 rows of data and will use two columns (firstname and lastname) using the Talend Data Generator functions:

    trowgenerator.png

     

  8. Configure the tRowGenerator to write the data directly to HDFS using a tHDFSOutput component that uses the tHDFSConnection you created above, connecting to it using a main row:

    thdfsoutput.png

     

  9. Use an On Component Ok connection to connect the tHDFSOutput to your tHiveRow component. This will insert the data you created with the tHDFSOutput into an Impala table you already have, using the Hive connection you set up in the tHiveConnection:

    thiverow.png

     

  10. Use an On Component Ok connection to connect it to a tHiveInput component that will read the information from the table above and output it to a tLogRow:

    thiveinput.png

     

  11. The last part of the Job design is to use the tPostJob component and connect it to a tHiveClose component with an On Component Ok connection, so that you can close the connection you opened:

    tpostjob.png

     

  12. The completed Job should look like this:

    jobdone.png

     

 

Run the Job

  1. Run your Job to see if you successfully connected to the Impala daemon using SSL, and if you are able to load data to your table and read from it:

     run.png

     

    runlog.png

Additional Notes

  1. The same Job design will work for any Impala configuration, not just for SSL-enabled Impala.
  2. You can utilize this design on any version of Talend 6, as it is not specific to version 6.4.
Version history
Revision #:
5 of 5
Last update:
‎04-20-2018 12:58 PM
Updated by:
 
Labels (6)
Contributors