How to use Context Variables to seamlessly deploy Big Data jobs to another environment

Overview

This article explains how using a new feature in Talend 6.3 that makes the SDLC process simpler when building Big Data Jobs.

 

Contexts groups were available for Big Data Metadata in earlier versions, but the values would not get replaced when the context group was changed. This has been resolved in Talend 6.3. The process to centralize Hadoop metadata by making use of context variables and groups is shown below.

 

Prerequisites

  1. Set up Kerberos and get a ticket:

    • Install the Kerberos client from the MIT site.
    • Update the security policies.
    • Configure krb5.ini files.
    • Add Big Data nodes to the hosts file on the local system.
    • Get a ticket:

      ticket.png

    For details about these steps, see these instructions: How to use Kerberos in Talend Studio with Big Data v6.x.

 

Configuring Hadoop

  1. Right-click the Hadoop Cluster within Metadata in Studio, then click Create Hadoop Cluster. Enter a name, then click Next.
  2. The Import Wizard opens. In the import options, select Enter manually Hadoop Services, then click Finish.

    import.png

     

  3. At the bottom of the Cluster Connection screen, click Export as context.

    exportContext.png

     

  4. On the Create/Reuse a context group screen, select Create a new repository context, then click Next

    create_reuse.png

     

  5. On the next screen, give the context group a name. Click Next, then Finish.

    context_name.png

     

  6. Click Finish on the New Hadoop Cluster Connection on repository screen. The cluster you just created will appear under Hadoop Cluster metadata. You will also see that a context group was created.
  7. Double-click the context group to open it. Now you can start adding the cluster information such as NameNodeUri and ResourceManager, as shown below:

    cluster_info.png

     

  8. You can also add contexts for new cluster environments such as QA and PROD. Click the plus [+] in the upper-right corner.
  9. The Configure Contexts window opens. Click New at the bottom, give the context a name, then click OK.

    context_name_new.png 

     

  10. Now you can add values for the new cluster environment:

    values.png

     

  11. If you want to know how Talend has bundled the libraries necessary for both context groups, open the Hadoop connection in Metadata, then click the ellipses [...] to the right of Use custom Hadoop configurations.

    custom_hadoop.png

     

  12. There will be one library for each context group.

    context_library.png

     

 

Build a Job

  1. Build a Standard Job as shown below. Add the tHDFSConnection and tHDFSOutput big data components and connect them.

    build_job.png

     

  2. For the tHDFSConnection component, select Repository as the Property Type, and select the metadata connection that you created in the process above. As you will see, all the context variables will be added automatically to the configuration.

    tHDFSConnection.png

     

  3. For the tHDFSOutput component, select Use an existing connection, then select the tHDFSConnection component from the Component drop-down list.

 

Run the Job

  1. On the Run tab of Studio, select which cluster you want the Job to run on by selecting the context group from the drop-down list. These are the contexts groups you created while configuring Hadoop.

    run_drop_down.png

     

  2. Similarly, you can select the context while creating an Execution task or an Artifact task in TAC, as shown below:

    exec_task.png

     

Version history
Revision #:
2 of 2
Last update:
‎10-12-2017 03:04 PM
Updated by:
 
Labels (2)
Tags (1)