Six Stars SJ
Six Stars

Parallel execution with iterate

Hi,

Following job runs without parallel execution:

121.PNG

 

But with parallel execution, this job stops updating hadoop file (in tHDFSOutput_1) after its 1st iteration. I have enabled multi thread execution as well ( under job - extra tab). And I am not using any context variable in this subjob, but it is still not updating that hadoop file. Now I wonder if this is a design issue. I will appreciate any help. Thanks!

 

SJ

1 ACCEPTED SOLUTION

Accepted Solutions
Moderator

Re: Parallel execution with iterate

Hello,

Actually, talend don't support for transferring data by air. It means you have to get these files on local directory firstly and then load these files to HDFS directory.

Best regards

Sabrina

 

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
7 REPLIES
Moderator

Re: Parallel execution with iterate

Hello,

From your screenshot, we can see you are using tSCPFileList component to iterate and list your files and folders on a SCP root directory. How did you

get these files on the SCP root directory to a local directory without using tSCPGet component?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Six Stars SJ
Six Stars

Re: Parallel execution with iterate

I am not getting these files in my local directory. I am putting files directly in Hadoop directory. I don't want to store data any where in between. But I am not sure why this parallel iteration is not working. It just stops after 1st execution. Thanks though!

 

SJ

Moderator

Re: Parallel execution with iterate

Hello,

Actually, talend don't support for transferring data by air. It means you have to get these files on local directory firstly and then load these files to HDFS directory.

Best regards

Sabrina

 

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Six Stars SJ
Six Stars

Re: Parallel execution with iterate

Hi,

Thanks for the reply. But parallel execution does not work even when I try to pull those files in my local directory:
121.PNG

May be there is some thing wrong in this version of Talend. 

 

SJ

Moderator

Re: Parallel execution with iterate

Hello,

Are you able to update all your hadoop files (in tHDFSOutput_1) when you try to pull those files in your local directory?

Actually, if the 'Multi-thread exectuion' box is checked, the different subjobs in the main job will execute parallel. You need to make sure that all the sub jobs are running independently in this way you are going to utilize the multi thread feature in your main job.

Here is a community knowleadge article:https://community.talend.com/t5/Design-and-Development/Can-I-run-different-subjobs-in-parallel-in-a-....

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Six Stars SJ
Six Stars

Re: Parallel execution with iterate

Thanks for sharing this solution! But tHDFSOutput_1 files update doesn't take that long in my case. I can still use multi-thread process though.

And you were right, parallel execution with iterate flow works when files are pulled in local directory instead. :

121.PNG

My file names are too long here so I am using tFileOutputDelimited instead of tSCPGet. Thanks xdshi!

 

SJ

Seventeen Stars

Re: Parallel execution with iterate

It might be related to your job or the SCP staff. I use this way to parallize jobs a lot and it works very well.