Both a Joblet and the tRunJob component encourage code reuse and refactoring, help improve development efficiency, and ease maintenance. However, you may wonder what the difference is between them, and when you should use one or the other. This article explains the differences between a Joblet and the tRunJob component from a technical point of view, as well as from a usage angle.
Although the tRunJob is a generic component available in the core product Talend Open Studio for Data Integration, the Joblet is an advanced feature that is only available in Talend Enterprise subscription products. Therefore this article applies mostly to Talend Enterprise subscription product users.
Talend Studio uses a Java code generator, and each Job is translated to a Java class. From a technical point of view, there are two differences:
Because of the differences between a Joblet and the tRunJob component in the code refactoring and function, the decision of when to use a Joblet or the tRunJob component is based on business requirements. The following explanation describes the circumstances which could lead you to choose one or the other.
The Joblet code is automatically included in the main Job code at run time, thus using fewer resources and improving performance. A Joblet is usually used to achieve the following needs:
Output or print static messages. If, for example, you want to trace the Job execution and print a static message for each step, create a Joblet and use a tJava to print this message at the beginning of the Job execution:
System.out.println("The job starts to run")
If, for example, you are reading data both from a file and a database in a Job, you need to process data in the same action. A Joblet is a created in this case:
File Input Component – Row Main – Joblet– Row Main – Target | OnSubjobOK | Database Input Component – Row Main – Joblet – Row Main – Target
The tRunJob component helps you master complex Job systems in real projects. The tRunJob is usually used to achieve the following needs:
This component can be used as a standalone Job and helps clarify a complex Job by avoiding having too many sub-jobs in one Job. You can create different Jobs for processing different business requirements, and then create a main Job to run the child Jobs called with the tRunJob component. For example, assuming you are building a data warehouse for retail, you populate the fact tables such as users, products, orders, and dimension tables in different Jobs, and create a main Job to run the child Jobs one by one.
tRunJob_1 (populate the product fact table) | OnSubjobOK | tRunJob_2 (populate the order fact table) | OnSubjobOK | tRunJob_3 (populateSalesByProductByMonth) | ...
The tRunJob component is the only solution for the following case you often face in real projects: reading data from a data source, then processing the data in a component. However, there might exist problematic data that lead to Job execution failure. The Job throws a Java exception and stops running. You need to capture the Java exception with a tLogCatcher component, log it to your database or file, and make the Job continue to perform the next data. For example, a table stores the email information as below:
The request is to read the email addresses from the table and send an email to each person with a tSendMail. But, as this table may contain invalid emails, if you put all the components in one Job it will stop once an invalid email is sent to tSendMail. To achieve this request, design the Jobs as follows:
mainJob: tMysqlInput_1: reads emails from the table. tFlowToIterate_1: iterates each email. tRunJob_1: calls the child Job.
In the Basic settings tab of the tRunJob_1, clear the Die on child error check box so that the main Job will not stop even though an error occurs in the child Job. In the Context Param table, pass the current email from the main Job to the child Job. For more information, read the article Passing a value from a parent Job to a child Job.
child Job: tSendMail_1: sends an email to each person. tLogCatcher_1: catches the Java exception and log it into a table.
On the Basic settings tab of tSendMail_1, in the To field, enter the context variable that stores the current email passed from the main Job.
Select the Die on error check box. This option makes the child Job throw a Java exception that will be captured by the tLogCatcher component when an email address is invalid.