Parent Job to execute child job x number of times

Six Stars

Parent Job to execute child job x number of times

I have a job that runs multiple tDBRow components that have create temp table statements with selects embedded in them.  These temp tables (created in Teradata db) will be used later on in the job to insert into two tables.

 

I am trying to figure out the best design to have a Parent Job that will take 100 accounts at a time (to start) and loop until the end of the total number of accounts. I have a prejob so far that sets a context var to the total number of accounts. I'm stuck at how I can loop process the 100 accounts...loop...pick up at the next 100 accounts until it reaches that total number.  Any thoughts?  


Accepted Solutions
Six Stars

Re: Parent Job to execute child job x number of times

Sure here's a simple job that prints out the numbers between 1 and 134 inclusive in batches of 10:

loops.PNG

Context looks like this:

loopContext.PNG

First loop

firstLoop.PNG

First tJava:

System.out.println( "In outer loop" );
System.out.println( "Inner start: " + context.innerStart );
System.out.println( "Inner end: " + context.innerEnd );

Second loop:

secondLoop.PNG

Second tJava:

int curIter = context.innerStart + ((Integer)globalMap.get("tLoop_2_CURRENT_ITERATION")) - 1;
System.out.println( "Inner loop variable: " + curIter );

Output:

Starting job InnerLoop at 17:16 25/07/2018.

[statistics] connecting to socket on port 3911
[statistics] connected
In outer loop
Inner start: 1
Inner end: 10
Inner loop variable: 1
Inner loop variable: 2
Inner loop variable: 3
Inner loop variable: 4
Inner loop variable: 5
Inner loop variable: 6
Inner loop variable: 7
Inner loop variable: 8
Inner loop variable: 9
Inner loop variable: 10
In outer loop
Inner start: 11
Inner end: 20
Inner loop variable: 11
Inner loop variable: 12
Inner loop variable: 13
Inner loop variable: 14
Inner loop variable: 15
Inner loop variable: 16
Inner loop variable: 17
Inner loop variable: 18
Inner loop variable: 19
Inner loop variable: 20
In outer loop
Inner start: 21
Inner end: 30
...
Inner start: 121
Inner end: 130
Inner loop variable: 121
Inner loop variable: 122
Inner loop variable: 123
Inner loop variable: 124
Inner loop variable: 125
Inner loop variable: 126
Inner loop variable: 127
Inner loop variable: 128
Inner loop variable: 129
Inner loop variable: 130
In outer loop
Inner start: 131
Inner end: 140
Inner loop variable: 131
Inner loop variable: 132
Inner loop variable: 133
Inner loop variable: 134
In outer loop
Inner start: 141
Inner end: 150
In outer loop
Inner start: 151
Inner end: 160
In outer loop
Inner start: 161
Inner end: 170
In outer loop
Inner start: 171
Inner end: 180
In outer loop
Inner start: 181
Inner end: 190
[statistics] disconnected
Job InnerLoop ended at 17:16 25/07/2018. [exit code=0]
Six Stars

Re: Parent Job to execute child job x number of times

@astridhern14 - "Enable parallel Execution" is enabled in the iterate flow and not on the tFlowToIterate component. It is available in 6.4 studio version. When you connect tFlowToIterate to tRunJob using iterate flow, right click and go to settings. Enable "Enable parallel Execution" and give appropriate integer value in "Number of parallel Execution". if you determine the number of iteration dynamically, then you can provide a context variable in "Number of parallel Execution"

All Replies
Six Stars

Re: Parent Job to execute child job x number of times

I haven't tried this, but I think you could accomplish it with two tLoops. First one sets the context variables context.innerStart and context.innerEnd. The second one iterates between innerStart and innerEnd. For example, the outer loop would set the variables like this:

 

Setup

innerStart = 0;

innerEnd = 0;

 

First iteration

innerStart=innerStart + 1;

innerEnd =innerEnd + 100;

 

Subsequent iterations

innerStart = innerEnd + 1;

innerEnd=innerEnd + 100; //check for overshooting the last record

Employee

Re: Parent Job to execute child job x number of times

Hi,

 

    By adding multiple tflowtoterate components in Talend, you can create multilevel loops in Talend.

 

    My understanding is that you are keeping the additional loop to process 100 accounts at a time since your underlying Teradata statements in tTeradatarow components are taking more time or more system resources now.

 

    Since you are using Teradata, I advise you to utilize the MPP capability of Teradata to its maximum extend rather than adding additional loops through Talend. You may have to check the explain plans of each statements you are executing and verify whether there is any AMP skewness happening in your query. It will be a good idea to get consultation from your Teradata DBA team about the impact of processing higher volume of accounts at same time and they can advise you by using Teradata Tuning Assistant utility.

 

Warm Regards,

 

Nikhil Thampi

 

Tags (1)
Six Stars

Re: Parent Job to execute child job x number of times

Hmm, thanks Jose.  I see what you're trying to do, but I'm new to Talend tool so I am not quite sure where the placements of these tLoop components would go.  Would you mind giving me some visual as to what you mean?  I'd greatly appreciate it.

Six Stars

Re: Parent Job to execute child job x number of times

Sure here's a simple job that prints out the numbers between 1 and 134 inclusive in batches of 10:

loops.PNG

Context looks like this:

loopContext.PNG

First loop

firstLoop.PNG

First tJava:

System.out.println( "In outer loop" );
System.out.println( "Inner start: " + context.innerStart );
System.out.println( "Inner end: " + context.innerEnd );

Second loop:

secondLoop.PNG

Second tJava:

int curIter = context.innerStart + ((Integer)globalMap.get("tLoop_2_CURRENT_ITERATION")) - 1;
System.out.println( "Inner loop variable: " + curIter );

Output:

Starting job InnerLoop at 17:16 25/07/2018.

[statistics] connecting to socket on port 3911
[statistics] connected
In outer loop
Inner start: 1
Inner end: 10
Inner loop variable: 1
Inner loop variable: 2
Inner loop variable: 3
Inner loop variable: 4
Inner loop variable: 5
Inner loop variable: 6
Inner loop variable: 7
Inner loop variable: 8
Inner loop variable: 9
Inner loop variable: 10
In outer loop
Inner start: 11
Inner end: 20
Inner loop variable: 11
Inner loop variable: 12
Inner loop variable: 13
Inner loop variable: 14
Inner loop variable: 15
Inner loop variable: 16
Inner loop variable: 17
Inner loop variable: 18
Inner loop variable: 19
Inner loop variable: 20
In outer loop
Inner start: 21
Inner end: 30
...
Inner start: 121
Inner end: 130
Inner loop variable: 121
Inner loop variable: 122
Inner loop variable: 123
Inner loop variable: 124
Inner loop variable: 125
Inner loop variable: 126
Inner loop variable: 127
Inner loop variable: 128
Inner loop variable: 129
Inner loop variable: 130
In outer loop
Inner start: 131
Inner end: 140
Inner loop variable: 131
Inner loop variable: 132
Inner loop variable: 133
Inner loop variable: 134
In outer loop
Inner start: 141
Inner end: 150
In outer loop
Inner start: 151
Inner end: 160
In outer loop
Inner start: 161
Inner end: 170
In outer loop
Inner start: 171
Inner end: 180
In outer loop
Inner start: 181
Inner end: 190
[statistics] disconnected
Job InnerLoop ended at 17:16 25/07/2018. [exit code=0]
Six Stars

Re: Parent Job to execute child job x number of times

Oh I see, cool!  Thanks for your help Jose.

Six Stars

Re: Parent Job to execute child job x number of times

Glad to be of help.

Six Stars

Re: Parent Job to execute child job x number of times

Hey @Jose_Quinteiro,

 

You know how in your design above, the batches run one after the other?  I am trying to push the limits of Talend's performance, which already shatters our current load process.  I am trying to figure out how to determine the number of batches that are going to run, and have them load at once.  So say, a job will require 5 batches of 5000, how can I submit them all at once?  I was seeing the Iterate line component and in its basic settings, I see "Enable Parallel Execution" and you can give it a number of how many you want.  Would that be a viable option?  Please let me know your thoughts and thanks for your help!!

Six Stars

Re: Parent Job to execute child job x number of times

@ astridhern14 - Yes ! this is a very good option. check/enable "Enable Parallel Execution" option in tflowtoIterate and in the advance settings you can provide a value for concurrent execution for example - 5,10 based on your need. Also don't forget to check memory management in case of memory intensive process. I mean you have to set the xms and xmx in run tab appropriately.
Six Stars

Re: Parent Job to execute child job x number of times

Unfortunately I haven't messed with parallel execution yet. i'd be interested to hear what you find out.

Six Stars

Re: Parent Job to execute child job x number of times

@arragupathy - Thanks for that, however, the option 'Enable Parallel Execution" is disabled in the tFlowToIterate component for me.  I'm using the Enterprise version of Talend (Real-time Big Data Platform (6.4.1)).

 

The issue I'm having is that I have 2 context variables where I am passing a min start and a min end number into a WHERE clause in a query in a sub job. Below is a pic, in the tLoop, I am changing the values of those variables and passing it in the tRunJob component.  So let's say I have 4 iterations to run based on my condition in the loop, I want to execute that job 4 times, keeping the values of the variables.  Right now, even if I have Iterate x4 (and I also have the multi-thread option checked) it looks like the query in the subjob is running one after the other.  I want the 4 queries with the diff variable values run at once.

 

 

Capture.PNG

Six Stars

Re: Parent Job to execute child job x number of times

@astridhern14 - "Enable parallel Execution" is enabled in the iterate flow and not on the tFlowToIterate component. It is available in 6.4 studio version. When you connect tFlowToIterate to tRunJob using iterate flow, right click and go to settings. Enable "Enable parallel Execution" and give appropriate integer value in "Number of parallel Execution". if you determine the number of iteration dynamically, then you can provide a context variable in "Number of parallel Execution"