We are trying to refactor some jobs that we inherited. There are a handful of datasets that are reused through out the job for different fraud checks We'd like to re-use them but break the fraud checks into different child jobs and then pass the re-usable datasets to the child jobs.
We aren't seeing a good way to pass the re-usable datasets to the child jobs. I'm seeing lots of posts about passing a value from/to Child job by context or global map, but that's not what we are trying to accomplish. For example, we want to do a handful of fraud checks on recent orders. The list of Orders to check with pertinent information would be passed to the Child job to be processed.
Seems like a feasible design pattern, but we are missing the ability to pass the datasets.
Any help you can provide is appreciated.
You can build a reusable mechanism for this using a bit of Java. I have written a tutorial to demonstrate how to achieve "Connect By Prior" functionality in Talend (https://www.rilhia.com/tutorials/talend-connect-example). This is not what you want, but I do use a technique that might work for you. I create a data class to hold my data using routines and store that data in an ArrayList. I pass this ArrayList to the child job via an Object Context Variable and simply cast it back to the required type inside the child job.
Look at the sections headed "Load data to ArrayList" and "Convert input array to a datarow". These show how I load the data into the ArrayList in the parent job, and how I read it back from the ArrayList within the child job.
If you are using large datasets, you may need to think about memory using this technique due to how Context Variables are passed to child jobs. If you want to receive the data back from the child job, you can return it using a tBufferOutput. Alternatively, if you want to get a bit more technical and pass the data back (and be a little more thread safe) you can use a ConcurrentHashMap to hold your data you are sending to the child job. This is a really neat way of handling datarows being passed to and from child jobs. You can use the same method I have used in the tutorial, but will have to work with the data in a slightly different way (if you know Java, this will be easy). Take a look at this tutorial (https://www.talendbyexample.com/talend-returning-values-from-subjobs.html) from one of my old colleagues and skip to the section talking about the "ConcurrentHashMap".
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Pick up some tips and tricks with Context Variables
Learn how media organizations have achieved success with Data Integration
Practical steps to developing your data integration strategy.