Five Stars

ETL Logging functionality using Joblet

Hi All,

 

I'm new to Talend, I was working with Informatica ETL tool before, so please bear with me.

 

My purpose is to do log metrics (Start Time, End Time, Records loaded, Source Table, Target Table etc) and load to a centralized table in a database.

I want to use Joblet to do this, because I dont want to code the same ETL logging functionality each time I create a new job, I think using Joblet is easier to maintain because then I dont have to go back to each Job if there needs to be a modification.

I tried doing this using Joblet, but I'm facing hurdles as for example if I wanted to capture the records loaded to a target table

I'm using PostJob and tMSSQLrow to execute stored procedure which updates/inserts records to this ETL table.

But my solution does not work, because the table name would change for each job

tMSSQLOUTPUT_1 can be tMSSQLOUTPUT_2 in another job.

Can you please let me know on how to approach?

 

Thanks

 

 

 

 

 

 

2 ACCEPTED SOLUTIONS

Accepted Solutions
Employee

Re: ETL Logging functionality using Joblet

Use the Activity Monitoring Console.  Create the 3 tables needed for AMC to log flowmeter, logcatcher, statcatcher metrics.  Turn on statistics on each component you want (be warned though that it impacts speed of execution of your job due to the excessive statistics).  Monitor the rows using the flow catcher.

 

When you run the job, everything will be written to AMC tables.  Then you can query it there.

Twelve Stars

Re: ETL Logging functionality using Joblet

@RaglandA I'm afraid you will need to set some values in your job in order for them to be used by your joblet. Your table names will need to be supplied for a start. Lets say that you have your logging functionality set up in your joblet (a perfectly acceptable solution), how do you know how many tables you may be reading or writing to? You could only be reading from 1 OR you could be reading and writing to 20 tables. 

 

As an alternative, I would suggest following what @iburtally said. Use the AMC. I have written a few tutorials on this (and am soon to write another with a better way of using it). Here is one of my older tutorials (https://www.rilhia.com/tutorials/talend-activity-monitoring-console-amc). You can see the sort of data you can get in that tutorial (and how to configure it with other tutorials linked to that). There is an issue with the AMC in that it can be quite expensive when it comes to processing time and database usage. You can get round this with a bit of a trick.

 

1) Configure your AMC tables

2) Create a joblet that looks like the following screenshot.....

AMC.jpg

What I am doing here is linking the tLogCatcher, tStatCatcher and tFlowMeterCatcher to tFlowToIterate components. I am then using RunIf links to decide whether to load this data to the AMC DB. The condition in the RunIfs are based on boolean context variables. This allows you to dynamically switch on and off the logging without redesigning your job (....a piece of functionality that is not natively supported by Talend).

3) Drop this joblet into your jobs.

 

The above will log everything the AMC supports.

 

 

I also think what @TRF said is sound. You will need to configure your logging for job specific stuff (ie table names, filenames, etc), within your job. YOu can create reusable functionality for this. Maybe some sort of child job or maybe even a joblet. But you will need to pass whatever you use some parameters for it to do what you want. As a hint, I would pay special attention to Talend PIDs (discussed in my tutorials). These will help you link your data for your job specific stuff with your AMC data.

 

This might seem like a lot of work, but once you have a framework configured it will save you a lot of time moving forwwards

Rilhia Solutions
5 REPLIES
Twelve Stars TRF
Twelve Stars

Re: ETL Logging functionality using Joblet

As I know, you need to use global variables to "pass" values to the joblet.
Populate the variables as soon as possible, for example after tMssqlOutput components, and use them in the joblet.

TRF
Employee

Re: ETL Logging functionality using Joblet

Use the Activity Monitoring Console.  Create the 3 tables needed for AMC to log flowmeter, logcatcher, statcatcher metrics.  Turn on statistics on each component you want (be warned though that it impacts speed of execution of your job due to the excessive statistics).  Monitor the rows using the flow catcher.

 

When you run the job, everything will be written to AMC tables.  Then you can query it there.

Five Stars

Re: ETL Logging functionality using Joblet

@TRF

 

so, what you are saying is to use a tSetGlobalVar to assign variables to a Global variable and to use them in the Joblet

But it doesn't serve my purpose as I still need to do the coding required to assign variables for each individual job

Twelve Stars

Re: ETL Logging functionality using Joblet

@RaglandA I'm afraid you will need to set some values in your job in order for them to be used by your joblet. Your table names will need to be supplied for a start. Lets say that you have your logging functionality set up in your joblet (a perfectly acceptable solution), how do you know how many tables you may be reading or writing to? You could only be reading from 1 OR you could be reading and writing to 20 tables. 

 

As an alternative, I would suggest following what @iburtally said. Use the AMC. I have written a few tutorials on this (and am soon to write another with a better way of using it). Here is one of my older tutorials (https://www.rilhia.com/tutorials/talend-activity-monitoring-console-amc). You can see the sort of data you can get in that tutorial (and how to configure it with other tutorials linked to that). There is an issue with the AMC in that it can be quite expensive when it comes to processing time and database usage. You can get round this with a bit of a trick.

 

1) Configure your AMC tables

2) Create a joblet that looks like the following screenshot.....

AMC.jpg

What I am doing here is linking the tLogCatcher, tStatCatcher and tFlowMeterCatcher to tFlowToIterate components. I am then using RunIf links to decide whether to load this data to the AMC DB. The condition in the RunIfs are based on boolean context variables. This allows you to dynamically switch on and off the logging without redesigning your job (....a piece of functionality that is not natively supported by Talend).

3) Drop this joblet into your jobs.

 

The above will log everything the AMC supports.

 

 

I also think what @TRF said is sound. You will need to configure your logging for job specific stuff (ie table names, filenames, etc), within your job. YOu can create reusable functionality for this. Maybe some sort of child job or maybe even a joblet. But you will need to pass whatever you use some parameters for it to do what you want. As a hint, I would pay special attention to Talend PIDs (discussed in my tutorials). These will help you link your data for your job specific stuff with your AMC data.

 

This might seem like a lot of work, but once you have a framework configured it will save you a lot of time moving forwwards

Rilhia Solutions
Five Stars

Re: ETL Logging functionality using Joblet

Thanks @rhall_2_0

 

Thanks Everyone for your help on this Smiley Happy