Failure to run a job designed in Talend Studio with Spark

Highlighted
Four Stars

Failure to run a job designed in Talend Studio with Spark

Two jobs (see below) have been created, but neither works properly.  Please help!

 

  • Both jobs were adapted from manual videos
  • The error log for both jobs are attached
  • These were created in the Japanese version of Talend Studio - sorry if translated poorly

 

Configuration:

  • macOS Mojave 10.14.6
  • java version 1.8.0_241
  • Talend Studio: Talend Cloud Real-Time Big Data Platform (7.2.1)
  • Hadoop Cluster: Amazon EMR 5.15.0 (Hadoop 2.8.3)
  • AWS Network: open port (in / out) for local PC

 

Job 1) Writing to the HDFS

Manual video: https://www.talend.com/resources/writing-reading-data-hdfs/

 

Status: Cluster repository is set and recognized by Studio. Pressing the execute button causes an error

 

Job 2) Running Big Data Batch job on Spark

Manual video: https://www.talend.com/resources/running-job-spark/

 

Status: Cluster repository is set and recognized by Studio. Pressing the execute button causes an error

 

Error log for Job 2 is below:

 

失敗2020-03-25 11:11:35

 

Task 5e79a3b8db994d34de3b78fc/4.9 failed

 

unexpectedly.org.talend.ipaas.rt.flow.controller.impl.FlowExecutionException: Step 2221021a-7d3a-4301-a79f-b9a629c4eb76 failed with code -1 and error Job stopped with errors or unable to run. (check the task execution logs for the error details) at org.talend.ipaas.rt.flow.controller.impl.JobControllerImpl.doRun(JobControllerImpl.java:98) ~[?:?] at org.talend.ipaas.rt.flow.controller.impl.AbstractControllerImpl.lambda$run$0(AbstractControllerImpl.java:106) ~[?:?] at java.lang.Thread.run(Thread.java:748) [?:?]Caused by: java.lang.Exception: Job stopped with errors or unable to run. (check the task execution logs for the error details) ... 3 more

 

失敗2020-03-25 11:11:35

 

TalendJob: 'EMR' - Failed with exit code: 1.

 

失敗2020-03-25 11:11:35

Error initializing SparkContext.org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:89) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) at org.apache.spark.SparkContext.(SparkContext.scala:500) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) at aaa.emr_0_1.EMR.runJobInTOS(EMR.java:1501) at aaa.emr_0_1.EMR.main(EMR.java:1394)

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Talend Cloud Developer Series – Defining Metadata

This video focuses on different methods of adding metadata to a job in Talend Cloud

Watch Now

Talend Cloud Developer Series – Updating Context Variables

This video will show you how to add context parameters to a job in Talend Cloud

Watch Now

Talend Cloud Developer Series – Deploying First Job to Cloud

This video will show you how to run a job in Studio and then publish that job to Talend Cloud

Watch Now