This article explains how to use JDBC components to connect to Impala that has Kerberos enabled. The same Job design will work for any Impala configuration, not just for Kerberos-enabled Impala.
Configure the Hadoop Cluster connection in metadata in Studio.
Select the distribution and version of your Hadoop cluster, then select Retrieve configuration from Ambari or Cloudera.
Enter the Cloudera Manager URL, along with your user credentials, then click Next. Cluster information will be retrieved and populated.
Once the cluster information is populated, click Check Services to ensure that Studio can successfully connect to the cluster.
According to the Cloudera documentation, when configuring Impala to work with JDBC, you can utilize two different options to connect: the Cloudera JDBC driver, and the Hive JDBC driver. Based on this information, you can utilize the JDBC components with the Impala JDBC driver to connect to Impala. Start by creating a Job that creates a file in HDFS, creates the table in Impala, loads that file in Impala, and then reads it.
Download from Cloudera’s website the Impala JDBC driver you will use in your tJDBCConnection component:
Download the version of the driver that is compatible with the version of Impala you have on the cluster.
Once the driver Zip file is downloaded and you unzip it, under the folder for the Impala JDBC4 version driver you should see the following list of libraries:
You need to add all these libraries to your tJDBCConnection component, in the Driver JAR section as below:
With the driver libraries added, configure the JDBC URL for the connection.
To identify the URL string that you need to use, follow the Cloudera instructions for structuring the URL based on authentication method in Cloudera JDBC Driver for Impala. Since you are configuring your components to connect to a Kerberized Impala, the JBDC URL to use is:
This URL specifies:
If you leave the JDBC URL as it appears above, then from wherever you launch this Job, it will look for a Kerberos ticket to utilize for the connection. If you want to control whether it uses a Kerberos ticket or a keytab, add one additional parameter called KrbAuthType. There are different values for that property depending on what you are trying to achieve:
For this example, use 1 as the value for this property, as you want to use a JAAS configuration with keytab. So, when you are done configuring your JDBC URL, it should look like this:
The full configuration of the component should look like this:
At this point, your Job should look like this:
Add a tRowGenerator component that will use TalendDataGenerator functions to generate 100 rows of data in two columns: one named fname, and the other Iname.
Configure the tRowGenerator to write the data directly to HDFS using a tHDFSOutput component that uses the tHDFSConnection you created above, connecting to it using a main row:
Connect the tHDFSOutput to the tJDBCRow component using an On Component Ok connection. tJDBCRow creates the table in Impala that will load your data, using the JDBC connection you set up in the tJDBCConnection:
Set up the tJDBCConnection as follows:
The final addition to the Job is to connect a tPostJob component to a tJDBCClose component with an On Component Ok connection, so you can close the connection you opened:
The complete Job should look like this:
You need a JAAS file with the information for the keytab, such as the one below, residing on the system that you will use to run the Job:
On the Run tab of the Job, in Advanced Settings > Use Specific JVM Arguments, add the following JVM parameter to specify the JAAS file you will use:
Run the Job to see if you successfully connect to the Impala daemon using SSL, load data to the table, and read from it: