How to get filename from Folder containing multiple files in a directory in Talend big data Spark job
Here are the details
It can be achieved in DI using tfilelist component and ((String)globalMap.get("tFileList_1_CURRENT_FILE")) but I need to achieve this in Talend spark big data batch job.
Please share some suggestions to achieve this in Talend spark big data batch jobs
Please find the attachment.
tFileList is a file orchestration component and it is available only in Standard jobs. There are no Spark specific activities which will be doing for this component. So the component was rightly placed in the Standard job.
So if you want to do the processing where you need to pass the file names as parameters, you will have to use parent-child relationship where you can call the BD job as an independent child job from standard job. Or you will have to orchestrate both jobs in such a way that BD job will be called multiple times through scheduler based on the number files (where file name will be passed as parameter).
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)
Thanks for your reply.
In my use case, I have more than 100 files to process, it would be a huge task to pass 100 files as context from standard job to Spark job. Is there any solution in Spark job where we have option file/folder ( i am using folder option in spark job) where each file in the folder is iterated and processed.
I want to pick or extract the filename that's processing in the stream/flow and load in to a output column FILE_NAME.
Please find the attached screenshot.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Part 2 of a series on Context Variables
Learn how to do cool things with Context Variables
Read about some useful Context Variable ideas