How to get filename from Folder containing multiple files in a directory in Talend big data Spark job

Highlighted
Five Stars

How to get filename from Folder containing multiple files in a directory in Talend big data Spark job

How to get filename from Folder containing multiple files in a directory in Talend big data Spark job

Here are the details

  1. All the files in the directory ("C:\data\product") having similar schema.
  2. I can extract the data from 3 files and output to delimited files, but I cannot extract "filenames"(product_Jan.txt, product_Feb.txt,  product_Mar.txt) from the directory("C:\data\product") and output the file names to a delimited file.

It can be achieved in DI using tfilelist component and ((String)globalMap.get("tFileList_1_CURRENT_FILE")) but I need to achieve this in Talend spark big data batch job.

Please share some suggestions to achieve this in Talend spark big data batch jobs

Please find the attachment.

Employee

Re: How to get filename from Folder containing multiple files in a directory in Talend big data Spark job

Hi,

 

    tFileList is a file orchestration component and it is available only in Standard jobs. There are no Spark specific activities which will be doing for this component. So the component was rightly placed in the Standard job.

 

    So if you want to do the processing where you need to pass the file names as parameters, you will have to use parent-child relationship where you can call the BD job as an independent child job from standard job. Or you will have to orchestrate both jobs in such a way that BD job will be called multiple times through scheduler based on the number files (where file name will be passed as parameter).

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Five Stars

Re: How to get filename from Folder containing multiple files in a directory in Talend big data Spark job

Hi Nikhil,

 

Thanks for your reply.

 

In my use case, I have more than 100 files to process, it would be a huge task to pass 100 files as context from standard job to Spark job. Is there any solution in Spark job where we have option file/folder ( i am using folder option in spark job) where each file in the folder is iterated and processed.

I want to pick or extract the filename that's processing in the stream/flow and load in to a output column FILE_NAME.

Please find the attached screenshot.

 

Thanks

Five Stars

Re: How to get filename from Folder containing multiple files in a directory in Talend big data Spark job

Hi Nikhil,



Thanks for your reply.



In my use case, I have more than 100 files to process, it would be a huge task to pass 100 files as context from standard job to Spark job. Is there any solution in Spark job where we have option file/folder ( i am using folder option in spark job) where each file in the folder is iterated and processed.

I want to pick or extract the filename that's processing in the stream/flow and load in to a output column FILE_NAME.

Please find the attached screenshot.



Thanks

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 3

Read about some useful Context Variable ideas

Blog