What is the best way to do this process more efficiently?

Five Stars

What is the best way to do this process more efficiently?

Hi! I've got a question about my jobs structure efficiency.

This is my current project hierachy:

 

ConsultaTalend(1).png

 

Green rectangles are ‘tRun’ components and inside them are all the processes to build the output tables.

 

My doubt is within fact tables tRun, where to build these tables I need to do a lecture of all dimension tables previusly loaded and then connect their input to hashes. This way I can use all the dimension tables data with only single read per table and thanks to hashes I can build all fact tables.

 

Here is the question, if my dimension tables begin to grow, maybe I'm going to have an excess memory error caused by the excesive tHash use.

 

Thinking about that, the other choice I have in mind is the following.

 

ConsultaTalend.png

 

Where instead of developing all the Fact tables in a single job from a once reading of Dimension tables and the use of the tHash component, I can achieve the same result developing the fact table processes in separate diferent tRuns, where instead of tHash using, I'm going to build a fact table for each tRunjob, with the consequent readings from each dimension table data I need.



And here is the final question. Which is the most efficient way to do these? tHash using, with the memory problems that could cause in the future and the difficult understanding of the Job cause all the development concentrated in a single job or the creation of multiple tRubJobs where inside them don't exist any tHash but exists multiple lectures from the table dimensions that could cause an increment of time job execution...

 

Thanks for your time!

Forteen Stars

Re: What is the best way to do this process more efficiently?

@PataToT ,Lets suppose,if your Fact table 1 have dependency with the only Dimension 1,you can move that after completion of Dimension 1,similarly other facts you need to verify and arrange in the way.

Manohar B
Don't forget to give kudos/accept the solution when a replay is helpful.
Employee

Re: What is the best way to do this process more efficiently?

@PataToT 

 

    Personally I would never use Hash Components while loading data to data warehouse facts and dimension tables due to possible memory issues.

 

    I would park them in temporary files for loading and then I will do Bulk load to the target table. In this way, I can avoid possible memory issues and at the same time, I can have a grip over overall job performances.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)

What’s New for Talend Spring ’19

Watch the recorded webinar!

Watch Now

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch