Pre-load “Spark.Yarn.Jar” to speed up execution of...
Pre-load “Spark.Yarn.Jar” to speed up execution of a Spark Job
Talend Version (Required)
Each run of a Spark Job uploads a talend-spark-assembly-x.x.x-SNAPSHOT-hadoopx.x.x-cdhx.x.x.jar jar package that affects the performance of HDFS and takes up HDFS space.
Cloudera 5.7 & HDP2.5 and above
Spark 1.6 above
Talend Data Fabric
Spark Job Setting
To run a Spark Job from Talend Studio can be very time consuming, especially for those Spark Jobs that interact with a Hadoop server installed in a remote location. To upload the spark-assembly-xxx jar (over 100 Mb) manually using Putty or SSH to a target HDFS directory will speed up execution a lot.
Problem root cause
The size of jars below is big, which affects performance.