Hello, I cannot find out how to iterate in Big Data Batch Job. Is there some possibility? Use cases: 1. I have folder paths written on separate rows in specific file. I would like to read the file and run Spark job for each path (for each row). But, I cannot find tFlowToIterate component. 2. I can make temp folders in some specified folder in HDFS. But, I cannot find tFileList component to run Spark job for each of temp folders. Number of folders/rows can change in each execution. .. I cannot find out any solution neither google it. Is it possible? Used version of Talend Studio is Talend Real-time Big Data Platform 6.2.1. Thank you for your help, Mira
Thanks for response. I can also see them in Talend Big Data studio, it is possible to use these components in Standard job. Problem is that I cannot find them in Big Data Batch job. See screenshots: 1. Components are not in Palette in case of Big Data Batch job. 2. Components are in Standard job, however Standard job cannot run Big Data Batch job as subjob.
Ok there are two possibilities which are not very nice, but works. a) put all files into one temporary folder, use tMap component which adds new field into each row of these files and then set Partitioning option in output component based on this field (I used tFileOutputParquet) - this will produce output separated by folders - then it is necessary to create standard job which will process these partitioned folders and which moves these files to prefered locations b) (this option is not tested) Use tLoop component followed by tJavaRow and then use path specified in tJavaRow component. - but tLoop must iterate over maximal number of potential produced subfolders, even if there is only one subfolder