What is the best way to do this process more efficiently?

Five Stars

What is the best way to do this process more efficiently?

Hi! I've got a question about my jobs structure efficiency.

This is my current project hierachy:

 

ConsultaTalend(1).png

 

Green rectangles are ‘tRun’ components and inside them are all the processes to build the output tables.

 

My doubt is within fact tables tRun, where to build these tables I need to do a lecture of all dimension tables previusly loaded and then connect their input to hashes. This way I can use all the dimension tables data with only single read per table and thanks to hashes I can build all fact tables.

 

Here is the question, if my dimension tables begin to grow, maybe I'm going to have an excess memory error caused by the excesive tHash use.

 

Thinking about that, the other choice I have in mind is the following.

 

ConsultaTalend.png

 

Where instead of developing all the Fact tables in a single job from a once reading of Dimension tables and the use of the tHash component, I can achieve the same result developing the fact table processes in separate diferent tRuns, where instead of tHash using, I'm going to build a fact table for each tRunjob, with the consequent readings from each dimension table data I need.



And here is the final question. Which is the most efficient way to do these? tHash using, with the memory problems that could cause in the future and the difficult understanding of the Job cause all the development concentrated in a single job or the creation of multiple tRubJobs where inside them don't exist any tHash but exists multiple lectures from the table dimensions that could cause an increment of time job execution...

 

Thanks for your time!

Forteen Stars

Re: What is the best way to do this process more efficiently?

@PataToT ,Lets suppose,if your Fact table 1 have dependency with the only Dimension 1,you can move that after completion of Dimension 1,similarly other facts you need to verify and arrange in the way.

Manohar B
Don't forget to give kudos/accept the solution when a replay is helpful.
Employee

Re: What is the best way to do this process more efficiently?

@PataToT 

 

    Personally I would never use Hash Components while loading data to data warehouse facts and dimension tables due to possible memory issues.

 

    I would park them in temporary files for loading and then I will do Bulk load to the target table. In this way, I can avoid possible memory issues and at the same time, I can have a grip over overall job performances.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog