my question is if there is a way (or what is the best way) to reuse data in subjobs without loading them again.
I have multiple jobs each of them performing a specific task. For example importX, importY, importZ, exportA, exportB,... (~30-40 of them)
They should be able to run on they're own but i also have a parent job that calls all of them in a row.
Most of the subjobs need data from a pretty big MySql-table. So there is a tMysqlInput to load this table in almost every job. The query for this needs about 5-10 seconds each time.
The data in this table will not change through the process so it would be ok if i load it just once.
Is there a way to load this data once into memory and then reuse it in any subjob but still maintain the possibility to run each subjob seperat?
Im searching for something like: "Was the data already loaded in another job before? use it. If not, load it now".
So if i start each of these jobs seperat they will load the data from DB. But if i start the parent job this data will only be loaded once and then all the other subjobs reuse it.
Thanks for you're help
I don't know such component.
I think it should be possible with big data. Spark provide a solution to keep data in memory and manipulate it.