I have a use case where we have around 100 input files in S3 bucket and all the source file information are stored in a metadata table(File name, Source, source directory,Target e.t.c) which will be passed as a parameter to a spark batch Job via context variable. I need the Job to process all the files parallely instead of iteration, in such a way the Job will be triggered in a multiple instances at the same time with 100 different parameter(for 100 files).
Can this be achieved with Talend spark batch job?.
About the process in the Job: Job will fetch the file and push it as a parquet file format into another S3 bucket after partitioning.
Are you referring to call a child spark job by tRunJob? The parent standard job and child spark job.
Let us know if this article is what you are looking for.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
This video will show you how to add context parameters to a job in Talend Cloud
This video will show you how to run a job in Studio and then publish that job to Talend Cloud
This video will help someone new to using Talend Studio get started by connecting to Talend Cloud and fetching the Studio License