Executing a Map/Reduce Job against a Hadoop High-Availability cluster

Overview

This article explains how to set your Hadoop High-Availability cluster in order to avoid errors when executing a Map/Reduce Job.

Environment

This article applies to version 5.3.1 of all subscription-based Talend Big Data solutions, used in a Cloudera Hadoop environment.

 

Symptoms/Description

When a Map/Reduce Job is executed against a Hadoop High Availability (HA) cluster, this error is displayed:

No encryption was performed by peer.
java.lang.IllegalArgumentException: java.net.UnknownHostException: fajita
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:389)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:356)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:124)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2218)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2252)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2234)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:300)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:103)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:902)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1319)
    at hadoop.scenario3_nohadoop_mr_0_1.Scenario3_NoHadoop_mr.runMRJob(Scenario3_NoHadoop_mr.java:11881)
    at hadoop.scenario3_nohadoop_mr_0_1.Scenario3_NoHadoop_mr.tHDFSInput_1Process(Scenario3_NoHadoop_mr.java:5431)
    at hadoop.scenario3_nohadoop_mr_0_1.Scenario3_NoHadoop_mr.run(Scenario3_NoHadoop_mr.java:11849)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at hadoop.scenario3_nohadoop_mr_0_1.Scenario3_NoHadoop_mr.runJobInTOS(Scenario3_NoHadoop_mr.java:11825)
    at hadoop.scenario3_nohadoop_mr_0_1.Scenario3_NoHadoop_mr.main(Scenario3_NoHadoop_mr.java:11812)
    Caused by: java.net.UnknownHostException: fajita

 

Resolution

Setting the properties of Hadoop High Availability cluster:

You have to gather some properties of your Hadoop High Availability cluster configuration and set those values in your Talend Studio.

Open the hdfs-site.xml file located in your cluster and select the property named dfs.nameservices:

 

<property>
  <name>dfs.nameservices</name>
  <value>nameservice1</value>
</property>

 

In this example, the value is nameservice1. This value is used as the property name for other properties. Find the properties named dfs.client.failover.proxy.provider.nameservice1 and dfs.ha.namenodes.nameservice1. Get the values:

 

<property>
  <name>dfs.client.failover.proxy.provider.nameservice1</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
  <name>dfs.ha.namenodes.nameservice1</name>
  <value>namenode90,namenode96</value>
</property>

 

For each of the xxx values of the second property, search for a property named dfs.namenode.rpc-address.nameservice1.xxx as follows:

 

<property>
  <name>dfs.namenode.rpc-address.nameservice1.namenode90</name>
  <value>namenode:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.nameservice1.namenode96</name>
  <value>namenode2:8020</value>
</property>

 

With all these values, you can now configure these properties in your Talend Studio, in the Hadoop Configuration tab.

 

ha.png

Contributed by Quihong Wei.

Version history
Revision #:
9 of 9
Last update:
‎04-21-2017 05:03 PM
Updated by:
 
Labels (1)