Executing a Map/Reduce Job against a Hadoop High-Availability cluster

Overview

This article explains how to set your Hadoop High-Availability cluster in order to avoid errors when executing a Map/Reduce Job.

Environment

This article applies to version 5.3.1 of all subscription-based Talend Big Data solutions, used in a Cloudera Hadoop environment.

 

Symptoms/Description

When a Map/Reduce Job is executed against a Hadoop High Availability (HA) cluster, this error is displayed:

No encryption was performed by peer.
java.lang.IllegalArgumentException: java.net.UnknownHostException: fajita
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:389)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:356)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:124)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2218)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2252)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2234)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:300)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:103)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:902)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1319)
    at hadoop.scenario3_nohadoop_mr_0_1.Scenario3_NoHadoop_mr.runMRJob(Scenario3_NoHadoop_mr.java:11881)
    at hadoop.scenario3_nohadoop_mr_0_1.Scenario3_NoHadoop_mr.tHDFSInput_1Process(Scenario3_NoHadoop_mr.java:5431)
    at hadoop.scenario3_nohadoop_mr_0_1.Scenario3_NoHadoop_mr.run(Scenario3_NoHadoop_mr.java:11849)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at hadoop.scenario3_nohadoop_mr_0_1.Scenario3_NoHadoop_mr.runJobInTOS(Scenario3_NoHadoop_mr.java:11825)
    at hadoop.scenario3_nohadoop_mr_0_1.Scenario3_NoHadoop_mr.main(Scenario3_NoHadoop_mr.java:11812)
    Caused by: java.net.UnknownHostException: fajita

 

Resolution

Setting the properties of Hadoop High Availability cluster:

You have to gather some properties of your Hadoop High Availability cluster configuration and set those values in your Talend Studio.

Open the hdfs-site.xml file located in your cluster and select the property named dfs.nameservices:

 

<property>
  <name>dfs.nameservices</name>
  <value>nameservice1</value>
</property>

 

In this example, the value is nameservice1. This value is used as the property name for other properties. Find the properties named dfs.client.failover.proxy.provider.nameservice1 and dfs.ha.namenodes.nameservice1. Get the values:

 

<property>
  <name>dfs.client.failover.proxy.provider.nameservice1</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
  <name>dfs.ha.namenodes.nameservice1</name>
  <value>namenode90,namenode96</value>
</property>

 

For each of the xxx values of the second property, search for a property named dfs.namenode.rpc-address.nameservice1.xxx as follows:

 

<property>
  <name>dfs.namenode.rpc-address.nameservice1.namenode90</name>
  <value>namenode:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.nameservice1.namenode96</name>
  <value>namenode2:8020</value>
</property>

 

With all these values, you can now configure these properties in your Talend Studio, in the Hadoop Configuration tab.

 

ha.png

Version history
Revision #:
10 of 10
Last update:
‎01-16-2018 04:22 PM
Updated by:
 
Labels (1)
Comments
darvesh

Hi,

 

May i know what will be the configuration setting for CDH5 Resource Manager HA in Talend?
Thanks in advance.

agershenson

Hi darvesh,

 

Sorry for the delay in replying. Could you please ask your question again in one of the Community discussion boards? You will get more attention there.

 

Thanks,

Alyce