my question is if there is a way (or what is the best way) to reuse data in subjobs without loading them again.
I have multiple jobs each of them performing a specific task. For example importX, importY, importZ, exportA, exportB,... (~30-40 of them)
They should be able to run on they're own but i also have a parent job that calls all of them in a row.
Most of the subjobs need data from a pretty big MySql-table. So there is a tMysqlInput to load this table in almost every job. The query for this needs about 5-10 seconds each time.
The data in this table will not change through the process so it would be ok if i load it just once.
Is there a way to load this data once into memory and then reuse it in any subjob but still maintain the possibility to run each subjob seperat?
Im searching for something like: "Was the data already loaded in another job before? use it. If not, load it now".
So if i start each of these jobs seperat they will load the data from DB. But if i start the parent job this data will only be loaded once and then all the other subjobs reuse it.
Thanks for you're help
I don't know such component.
I think it should be possible with big data. Spark provide a solution to keep data in memory and manipulate it.
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.