Hadoop Cluster on HA cluster

One Star

Hadoop Cluster on HA cluster

Hello,
Currently I'm testing HA (High Availability) on MapR cluster. I've created 3 nodes on MapR and define HA RM (resource manager) to those machines. Now, every time the cluster is up, it will select 1 machine as an active RM. Below is yarn-site.xml for this setting
<?xml version="1.0"?>
<!--
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at
   
 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. See accompanying LICENSE file.
-->
<configuration>
 <!-- Resource Manager HA Configs -->
 <property>
   <name>yarn.resourcemanager.ha.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
   <value>true</value>
 </property>
 <property>
   <name>yarn.resourcemanager.recovery.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>yarn.resourcemanager.cluster-id</name>
   <value>yarn-mapr.xq</value>
 </property>
 <property>
   <name>yarn.resourcemanager.ha.rm-ids</name>
   <value>rm1,rm2,rm3</value>
 </property>
 <property>
   <name>yarn.resourcemanager.ha.id</name>
   <value>rm1</value>
 </property>
 <property>
   <name>yarn.resourcemanager.zk-address</name>
   <value>mapr1.mapr.xq:5181,mapr2.mapr.xq:5181,mapr3.mapr.xq:5181</value>
 </property>

 <!-- Configuration for rm1 -->
 <property>
   <name>yarn.resourcemanager.scheduler.address.rm1</name>
   <value>mapr1.mapr.xq:8030</value>
 </property>
 <property>
   <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
   <value>mapr1.mapr.xq:8031</value>
 </property>
 <property>
   <name>yarn.resourcemanager.address.rm1</name>
   <value>mapr1.mapr.xq:8032</value>
 </property>
 <property>
   <name>yarn.resourcemanager.admin.address.rm1</name>
   <value>mapr1.mapr.xq:8033</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.address.rm1</name>
   <value>mapr1.mapr.xq:8088</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.https.address.rm1</name>
   <value>mapr1.mapr.xq:8090</value>
 </property>
 <!-- Configuration for rm2 -->
 <property>
   <name>yarn.resourcemanager.scheduler.address.rm2</name>
   <value>mapr2.mapr.xq:8030</value>
 </property>
 <property>
   <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
   <value>mapr2.mapr.xq:8031</value>
 </property>
 <property>
   <name>yarn.resourcemanager.address.rm2</name>
   <value>mapr2.mapr.xq:8032</value>
 </property>
 <property>
   <name>yarn.resourcemanager.admin.address.rm2</name>
   <value>mapr2.mapr.xq:8033</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.address.rm2</name>
   <value>mapr2.mapr.xq:8088</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.https.address.rm2</name>
   <value>mapr2.mapr.xq:8090</value>
 </property>
 <!-- Configuration for rm3 -->
 <property>
   <name>yarn.resourcemanager.scheduler.address.rm3</name>
   <value>mapr3.mapr.xq:8030</value>
 </property>
 <property>
   <name>yarn.resourcemanager.resource-tracker.address.rm3</name>
   <value>mapr3.mapr.xq:8031</value>
 </property>
 <property>
   <name>yarn.resourcemanager.address.rm3</name>
   <value>mapr3.mapr.xq:8032</value>
 </property>
 <property>
   <name>yarn.resourcemanager.admin.address.rm3</name>
   <value>mapr3.mapr.xq:8033</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.address.rm3</name>
   <value>mapr3.mapr.xq:8088</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.https.address.rm3</name>
   <value>mapr3.mapr.xq:8090</value>
 </property>
 <!-- :::CAUTION::: DO NOT EDIT ANYTHING ON OR ABOVE THIS LINE -->
</configuration>

The problem is, when I want to create a Hadoop Cluster connection from Talend, it prompted to insert RM & RM scheduler address

I can find the address easily using command "clush -a netstat -tulpn | grep 8030" on the server. But the real problem is, when the cluster change its active RM, then I must manually locate the active RM and change the value in cluster connection.
One thing that I note with RM (yarn) behavior is, when it's set to HA configuration it will ALWAYS redirect web request address to its active RM (request to web address server:8088). Can we use the value of this server instead? Or is it any correct way to create connection to HA cluster from talend?
Employee

Re: Hadoop Cluster on HA cluster

You need to the RM property as ${yarn.resourcemanager.hostname}:8032 ansd add yarn.client.failover-proxy-provider hadoop property.
By setting those properties talend will connect to the active RM node automatically.
Check MapR documentation for further details.