Running the same Spark Job in multiple Instances/threads

Highlighted
One Star

Running the same Spark Job in multiple Instances/threads

Hi All,

I have a use case where we have around 100 input files in S3 bucket and all the source file information are stored in a metadata table(File name, Source, source directory,Target e.t.c) which will be passed as a parameter to a spark batch Job via context variable. I need the Job to process all the files parallely instead of iteration, in such a way the Job will be triggered in a multiple instances at the same time with 100 different parameter(for 100 files).

Can this be achieved with Talend spark batch job?.

 

About the process in the Job: Job will fetch the file and push it as a parquet file format into another S3 bucket after partitioning.

Moderator

Re: Running the same Spark Job in multiple Instances/threads

Hello,

Are you referring to call a child spark job by tRunJob? The parent standard job and child spark job.

Let us know if this article is what you are looking for.

https://community.talend.com/t5/Architecture-Best-Practices-and/Spark-Dynamic-Context/ta-p/33038 

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Talend Cloud Developer Series – Updating Context Variables

This video will show you how to add context parameters to a job in Talend Cloud

Watch Now

Talend Cloud Developer Series – Deploying First Job to Cloud

This video will show you how to run a job in Studio and then publish that job to Talend Cloud

Watch Now

Talend Cloud Developer Series – Fetching Studio License

This video will help someone new to using Talend Studio get started by connecting to Talend Cloud and fetching the Studio License

Watch Now