Differences between a Joblet and the tRunJob component

 Overview

Both Joblet and tRunJob component encourage code reuse and refactoring, help improve the development efficiency and ease the maintenance. However, you may wonder what is the difference between them, and in which case you should use one or the other. This article explains the differences between a Joblet and the tRunJob component from a technical point of view as well as from a usage angle.

Environment

Although the tRunJob is a generic component available in the core product Talend Open Studio for Data Integration, the Joblets are an advanced feature that is only available in Talend Enterprise subscription products. Therefore this article applies mostly to Talend Enterprise subscription product users.

Description

Difference

Talend Studio uses a Java code generator, each Job is translated to a Java class. From a technical point of view, there are two differences:

  • The tRunJob component executes a child Job, which is a separate Java class. The main Job instantiates the child Job and execute it using the runJob method. A Joblet is just a GUI extraction and refactoring of some components. It creates a reusable transformation, the generated code of Joblet is still a part of the Java class of the main Job.
  • The tRunJob component is a different unit of execution and has its own context variables. The child Job, called with the tRunJob in the main Job, can't access the context variables of the main Job. However, a Joblet can access the context variables of the main Job, as it is a part of the main Job. 

Usage

Because of the differences between a Joblet and the tRunJob component in the code refactoring and function, the decision of when to use a Joblet or the tRunJob component is based on business requirements. The following explanation describes the circumstances which could lead you to choose one or the other.

Joblet

The Joblet code is automatically included in the main Job code at runtime, thus using less resources and improving performance. A Joblet is usually used to achieve the following needs:

  • Output or print static messages. Sometimes, we want to trace the Job execution, print a static message for each step, for example, create a Joblet and use a tJava to print this message at the beginning of the Job execution:

    System.out.println("The job starts to run")
  • Load value of context variables from a file or a database. If a Job or multiple Jobs load the value of context variables from a file or a database, you should usually create a dedicated Joblet to accomplish this task.
  • Manage custom logs with a tLogCatcher component or a tStatCatcher component as the first component in the Jobs.
  • Create a reusable transformation regardless of the type of input and output data source.
    For example, you are reading data both from a file and a database in a Job, you need to process data in the same action. A Joblet is a created in this case:

    File Input Component – Row MainJoblet Row Main – Target
                |
         OnSubjobOK
                |
    Database Input Component – Row MainJobletRow Main – Target

tRunJob

The tRunJob component helps mastering complex Job systems in real project. The tRunJob is usually used to achieve the following needs:

  • This component can be used as a standalone Job and helps clarifying a complex Job by avoiding having too many sub-jobs in one Job. You can create different Jobs for processing different business requirements, and then create a main Job to run the child Jobs called with the tRunJob component. For example, assuming you are building a data warehouse for retail, you populate the fact tables such as users, product, orders and dimension tables in different Jobs, and create a main Job to run the child Jobs one by one.

    tRunJob_1 (populate the product fact table)
         |
    OnSubjobOK
         |
    tRunJob_2 (populate the order fact table)
         |
    OnSubjobOK
         | 
     tRunJob_3 (populateSalesByProductByMonth)
         |
        ... 

  • The tRunJob component is the only solution for the below case you often face in real projects: read data from a data source, then process the data in a component. However, there might exist problematic data that lead to the Job execution failure. The Job throws a Java exception and stops to run. You need to capture the Java exception with a tLogCatcher component, log it to your database or file, and make the Job continue to perform the next data. For example, a table stores the email information as below:

    email
    email1@talend.com
    email2@talend.com
    email3@talend.com
    ...

    The request is to read the email addresses from the table and send an email to each person with a tSendMail. But, as this table may contain invalid emails, the Job stops once an invalid email is sent to the tSendMail if you put all the components in one Job. To achieve this request, design the Jobs as follows:

    mainJob:

    tMysqlInput_1: reads emails from the table.

    tFlowToIterate_1: iterates each email.

    tRunJob_1: calls the child Job.


    In the Basic settings tab of the tRunJob_1, clear the Die on child error check box so that the main Job will not stop even though an error occurs in the child Job. In the Context Param table, pass the current email from the main Job to the child Job. For more information, read the article about Passing a value from a parent Job to a child Job.

    child Job:

    tSendMail_1: sends an email to each person.

    tLogCatcher_1: catches the Java exception and log it into a table.

    In the Basic settings tab of the tSendMail_1, in the To field, enter the context variable that stores the current email passed from the main Job.

    Select the Die on error check box. This option makes the child Job throw a Java exception that will be captured by the tLogCatcher component when an email address is invalid.

 
Version History
Revision #:
1 of 1
Last update:
‎04-17-2017 11:01 PM
Updated by:
 
Labels (1)
Contributors