I want to create a job that loops for every record in a database table.
Then execute a sub job for each iteration of the loop.
This I can do easily, but I do not want to have to wait for the subjob to finish.
Is it possible to fire off a sub job and not have to wait for it to finish before iterating and executing the sub job again?
What is the logic you want to do asynchronously? Why?
Let's dissect and understand what you are trying to do:
For each record, call a sub job with some complex logic asynchronously, i.e. do not wait for it to finish. Just to confirm, a subjob is just a logic within the same job (it is not a tRunJob --- because tRunJob will be bad if you do this for each record and have thousands of records to process).
The problem with doing it asynchronously is that you need a thread for each iteration. If your complex logic takes very long, and you have thousands of records, you can easily end up with thousands of threads, and in this case, the CPU time for each thread is reduced as the CPU deals with the number of threads you have. Plus you could have deadlock of the threads are dealing with writing to shared resources. So you need to be careful. Plus it is more complex to deal with asynchronous threads when it comes to debugging etc.
Normally, most systems deal with this by having a pool. You define a poolsize, and once you max out your pool size, your program will have to wait anyway.
Another approach is parallelize. In the case of iteration on subjob, Talend provide the parallelize approach. You can click on your on the Iterate link like I showed above and increase the number of parallel execution. The number of parallel execution should be (No of Threads/Cores on you System - 1). Hence, this number is bound by the number of threads/cores on your system. Generally a number around 3, or 5 or 7. If you put it higher to the number of hyperthreads, you just cause your CPU to switch threads more often and slice the time available more often. And it often leads to throughput or performance degradation. This has the same effect of a thread pool.
Thanks for your response.
The main job is an Orchestration job designed to execute a generic job multiple times, which then executes a stored procedure in a database.
We have a control table each record represents a stored procedure, so there is no complex logic in Talend to execute, just orchestrate. If we have a new stored procedure then we add a record to the control table without having to change code. There will not be thousands of records, so I am not worried about the number of records also I can configure how many iterations I run.
I have executed sub jobs in parallel for a fixed number of parallel executions before, this however is dynamic so I want to loop through the table and fire off the job without having to wait for completion.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Part 2 of a series on Context Variables
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema