Hi, I am looping a files in a folder and i want to loop the files in parallel. i am giving "On Component OK" connection to the sub-jobs, and it is executing as Order:1 and Order:2. Ex: I have folder "A" contains "A_20140502.txt" & "B_20140502.txt" & "C_20140502.txt". as explained above file "B_20140502.txt" is waiting for the file "A_20140502.txt" to complete the process. Its like sequentially execution. i want to execute three files at a time in a loop. help needed to make it as parallel execution. Thanks---->D
Hi Ashok, If you want to implement parallel process in your TOS job then you have to enable the Multi thread execution feature like below. Go to -->Job settings --> select Extra tab --> enable the Multi thread execution checkbox.
Now you need to make sure that all the sub jobs are running independently in this way you will going to utilize the multi thread feature in your main job. After enabling multi thread execution,your main job should looks like below : Main job : This is just a basic idea Read file names from the directory "A" --> Read the contents from the file1 Read file names from the directory "B" --> Read the contents from the file2 Use global variable ( ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) )to read the current file path and configure in the tFileInput* component.
tFileList configuration :
tFileInput* configuration :
Hope that helps you. Follow the above procedure and get back to us in case of any issues.
Hi Sayagoud, I tried with your input, but no luck. may be i am in wrong route to achieve my scenario. let's dig deep. I have following files directory with two set of files i.e. 20140502, 20140503.
first i will loop the date 20140502 as shown above with three parallel executions of files A,B,C second i will loop second date 20140503 with three parallel executions of A,B,C. I designed a job as shown below.
After enabling multi thread execution it is behaving same as previous.
In my sub jobs tfilelist i am using filemask as ""*_A_*"+((String)globalMap.get("row9.newColumn"))+".txt"" & ""*_B_*"+((String)globalMap.get("row9.newColumn"))+".txt"" Please suggest me whether i am doing it in right way or not? I hope you are clear now. Thanks---->D
This design cannot be optimized for parallelization. You have to build a job which processes only one file and file path have to given as context variable to the job (call it e.g. file_worker). Now create a job which iterates through the files via tFileList and start with the iterate trigger the worker job and hand over the file name via the context setting in the tRunJob component for the worker job. No select the iterate trigger and enable parallel execution. This design can handle as much files in parallel nearly endless. One good thing is the way talend handles the parallelisation. The number of threads will be kept stable if possible, means if you set the parallelisation to 5 always 5 files will be processed parallel. Next good thing is: you can test the processing of one file separately.
One hint. I suggest you persists the data in the database within one transaction (if the number of rows is reasonable). This way your job can fail and do not violate the consistence of the database. Next thing: I would always try to delete previous data related to the same data realm. Perhaps you want to reload a file and this way no records from the previous run can survive.