to load and some transformation on 40 tables

Six Stars

to load and some transformation on 40 tables

Hi ,

I am trying to design the ETLs for 40 tables with  individual mappings for 40 tables as sub jobs . 

can I use  trunjobs in the parent job to call the 40 sub jobs?. are there going to be any  performance issues to come up ? 

Is my design right to call different jobs from a parent job assuming I am handling 10 - 40 million records every table? 

 

 

Please let me know. 

 

 

 

 

Sixteen Stars TRF
Sixteen Stars

Re: to load and some transformation on 40 tables

Due to the juge number of rows per table you could also think about parallelizing operations, probably not for all of them but you could define the appropriate strategy to avoid dead locks and/or performance issues.

TRF
Highlighted
Employee

Re: to load and some transformation on 40 tables

@gpinumalla 

 

I agree with @TRF . Running 40 parallel child jobs from a single parent job might be overwhelming. End of day, your job server memory will play a crucial role in the overall scheme of things. You will have to do thorough capacity planning to make sure that the system is able to handle the parallel load. You will also have to do series of performance testing to make sure that everything is fine.

 

    I would stagger the job flow rather than a big bang approach since your data volume per table is high. It would be ideal to extract delta records only rather than full refresh everyday.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

Six Stars

Re: to load and some transformation on 40 tables

Thank you for the info (@TRF and @nikhilthampi)  . I have the requirement to add more 40 tables. it is totally 80 tables now. it should be only month end job and refreshed only once a month job. logic is just to trunc and load all tables. 

 

what can be the best way to handle? 

 

 

Employee

Re: to load and some transformation on 40 tables

Hi,

 

    If your target is a data lake, you can also evaluate the option to use Bigdata batch jobs. Personally, I am thinking the truncate and load for such a huge volume using Standard job will be possible only if your Remote engines are of high computational capacity.

 

    Right now, you are trying to load 80 tables for a monthly report at end of month. Why don't you send the data every day as delta process to reduce the load at end of month? If you are trying to do data load at end of month in parallel manner for 80 high volume tables, you may not meet the SLA.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

Six Stars

Re: to load and some transformation on 40 tables

my target is oracle. the tables which have timestamps can be done usingthe delta load but it needs CDC enabled in a database and as far as know, it is very tough to get a user permissions as  user for reading from database. 

 

Is there a way to use spark batch jobs to push data read and write into oracle jobs. I understand doing talend jobs for month end is a lot for 80 tables. instead of going for CDC jobs. using spark with talend is good approach ? 

 

Employee

Re: to load and some transformation on 40 tables

Hi,

 

    Why don't you extract the delta records based on a source time stamp column to keep it simple?

 

    There are Oracle input and output components available in Spark Batch jobs too. But you will not have Bulk components in Spark batch jobs. 

 

     Again, I would try to break it into delta batches on daily basis to keep it simple and manageable.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

Six Stars

Re: to load and some transformation on 40 tables

The problem is that some tables do not have any timestamp columns. it will be hard to mainitain some tables with CDC and others with batch load jobs are monthly jobs. 

Six Stars

Re: to load and some transformation on 40 tables

I do not see any parallelization component with Talend Open studio. Does it comes with only commerical license. Is there any component that can be used. 

 

Can you let me know. 

Employee

Re: to load and some transformation on 40 tables

@gpinumalla 

 

Parallelization component is only available with enterprise version of Talend.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved

 

Six Stars

Re: to load and some transformation on 40 tables

Thanks for the information @nikhilthampi @TRF 

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog