Connecting jobs and limiting processed data to that loaded in first job. Managing state?

Highlighted
Six Stars

Connecting jobs and limiting processed data to that loaded in first job. Managing state?

I've got a very large ETL project, with many jobs that load data into a database's table.  However subsequent jobs are intended to work only on the rows loaded by the first job, what is the best way to keep this state?


Or if you believe there is a better way to tie specific jobs to more generic jobs together, please let me know. 


Accepted Solutions
Sixteen Stars TRF
Sixteen Stars

Re: Connecting jobs and limiting processed data to that loaded in first job. Managing state?

If you have a natural timestamp corresponding to the createdDate for your tables, you just have to get the current datetime when the 1st job starts, then pass this value to subsequent jobs using a context variable to allow these jobs to select new records (createdDate >= context.yourTimestanp).

You can also decide to create your own column to store the createdDate in the desired tables and pass the value with the insert request or create a trigger on the database for these tables to populate this column.

An other solution could be to have a boolen column on these tables with True as the default value. Only records with True for this column should be selected by jobs and a final job will have to set the value to False when all the steps are finished.

Many variants of these suggestions could be imagined.


TRF

View solution in original post


All Replies
Sixteen Stars TRF
Sixteen Stars

Re: Connecting jobs and limiting processed data to that loaded in first job. Managing state?

If you have a natural timestamp corresponding to the createdDate for your tables, you just have to get the current datetime when the 1st job starts, then pass this value to subsequent jobs using a context variable to allow these jobs to select new records (createdDate >= context.yourTimestanp).

You can also decide to create your own column to store the createdDate in the desired tables and pass the value with the insert request or create a trigger on the database for these tables to populate this column.

An other solution could be to have a boolen column on these tables with True as the default value. Only records with True for this column should be selected by jobs and a final job will have to set the value to False when all the steps are finished.

Many variants of these suggestions could be imagined.


TRF

View solution in original post

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog