Hi! I've got a question about my jobs structure efficiency.
This is my current project hierachy:
Green rectangles are ‘tRun’ components and inside them are all the processes to build the output tables.
My doubt is within fact tables tRun, where to build these tables I need to do a lecture of all dimension tables previusly loaded and then connect their input to hashes. This way I can use all the dimension tables data with only single read per table and thanks to hashes I can build all fact tables.
Here is the question, if my dimension tables begin to grow, maybe I'm going to have an excess memory error caused by the excesive tHash use.
Thinking about that, the other choice I have in mind is the following.
Where instead of developing all the Fact tables in a single job from a once reading of Dimension tables and the use of the tHash component, I can achieve the same result developing the fact table processes in separate diferent tRuns, where instead of tHash using, I'm going to build a fact table for each tRunjob, with the consequent readings from each dimension table data I need.
And here is the final question. Which is the most efficient way to do these? tHash using, with the memory problems that could cause in the future and the difficult understanding of the Job cause all the development concentrated in a single job or the creation of multiple tRubJobs where inside them don't exist any tHash but exists multiple lectures from the table dimensions that could cause an increment of time job execution...
Thanks for your time!
@PataToT ,Lets suppose,if your Fact table 1 have dependency with the only Dimension 1,you can move that after completion of Dimension 1,similarly other facts you need to verify and arrange in the way.
Personally I would never use Hash Components while loading data to data warehouse facts and dimension tables due to possible memory issues.
I would park them in temporary files for loading and then I will do Bulk load to the target table. In this way, I can avoid possible memory issues and at the same time, I can have a grip over overall job performances.
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)
Watch the recorded webinar!
Accelerate your data lake projects with an agile approach
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.