Hadoop Cluster on HA cluster

Highlighted
One Star

Hadoop Cluster on HA cluster

Hello,
Currently I'm testing HA (High Availability) on MapR cluster. I've created 3 nodes on MapR and define HA RM (resource manager) to those machines. Now, every time the cluster is up, it will select 1 machine as an active RM. Below is yarn-site.xml for this setting
<?xml version="1.0"?>
<!--
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at
   
 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. See accompanying LICENSE file.
-->
<configuration>
 <!-- Resource Manager HA Configs -->
 <property>
   <name>yarn.resourcemanager.ha.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
   <value>true</value>
 </property>
 <property>
   <name>yarn.resourcemanager.recovery.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>yarn.resourcemanager.cluster-id</name>
   <value>yarn-mapr.xq</value>
 </property>
 <property>
   <name>yarn.resourcemanager.ha.rm-ids</name>
   <value>rm1,rm2,rm3</value>
 </property>
 <property>
   <name>yarn.resourcemanager.ha.id</name>
   <value>rm1</value>
 </property>
 <property>
   <name>yarn.resourcemanager.zk-address</name>
   <value>mapr1.mapr.xq:5181,mapr2.mapr.xq:5181,mapr3.mapr.xq:5181</value>
 </property>

 <!-- Configuration for rm1 -->
 <property>
   <name>yarn.resourcemanager.scheduler.address.rm1</name>
   <value>mapr1.mapr.xq:8030</value>
 </property>
 <property>
   <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
   <value>mapr1.mapr.xq:8031</value>
 </property>
 <property>
   <name>yarn.resourcemanager.address.rm1</name>
   <value>mapr1.mapr.xq:8032</value>
 </property>
 <property>
   <name>yarn.resourcemanager.admin.address.rm1</name>
   <value>mapr1.mapr.xq:8033</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.address.rm1</name>
   <value>mapr1.mapr.xq:8088</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.https.address.rm1</name>
   <value>mapr1.mapr.xq:8090</value>
 </property>
 <!-- Configuration for rm2 -->
 <property>
   <name>yarn.resourcemanager.scheduler.address.rm2</name>
   <value>mapr2.mapr.xq:8030</value>
 </property>
 <property>
   <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
   <value>mapr2.mapr.xq:8031</value>
 </property>
 <property>
   <name>yarn.resourcemanager.address.rm2</name>
   <value>mapr2.mapr.xq:8032</value>
 </property>
 <property>
   <name>yarn.resourcemanager.admin.address.rm2</name>
   <value>mapr2.mapr.xq:8033</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.address.rm2</name>
   <value>mapr2.mapr.xq:8088</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.https.address.rm2</name>
   <value>mapr2.mapr.xq:8090</value>
 </property>
 <!-- Configuration for rm3 -->
 <property>
   <name>yarn.resourcemanager.scheduler.address.rm3</name>
   <value>mapr3.mapr.xq:8030</value>
 </property>
 <property>
   <name>yarn.resourcemanager.resource-tracker.address.rm3</name>
   <value>mapr3.mapr.xq:8031</value>
 </property>
 <property>
   <name>yarn.resourcemanager.address.rm3</name>
   <value>mapr3.mapr.xq:8032</value>
 </property>
 <property>
   <name>yarn.resourcemanager.admin.address.rm3</name>
   <value>mapr3.mapr.xq:8033</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.address.rm3</name>
   <value>mapr3.mapr.xq:8088</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.https.address.rm3</name>
   <value>mapr3.mapr.xq:8090</value>
 </property>
 <!-- :::CAUTION::: DO NOT EDIT ANYTHING ON OR ABOVE THIS LINE -->
</configuration>

The problem is, when I want to create a Hadoop Cluster connection from Talend, it prompted to insert RM & RM scheduler address

I can find the address easily using command "clush -a netstat -tulpn | grep 8030" on the server. But the real problem is, when the cluster change its active RM, then I must manually locate the active RM and change the value in cluster connection.
One thing that I note with RM (yarn) behavior is, when it's set to HA configuration it will ALWAYS redirect web request address to its active RM (request to web address server:8088). Can we use the value of this server instead? Or is it any correct way to create connection to HA cluster from talend?
Highlighted
Employee

Re: Hadoop Cluster on HA cluster

You need to the RM property as ${yarn.resourcemanager.hostname}:8032 ansd add yarn.client.failover-proxy-provider hadoop property.
By setting those properties talend will connect to the active RM node automatically.
Check MapR documentation for further details.

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now