How to schedule a Talend Job with Kubernetes

This article explains how to schedule a Talend Job as a Kubernetes Job and how to use Kubernetes as a Job orchestrator.

 

Sources for this project are available in the attached Zip files.

 

Talend Job

 

Configuration

  1. Create a Talend Job to generate random data and display it in the output console.

     

    Job-design.png

     

  2. Configure the tRowGenerator component as follows:

    • Identifier: A random value for each row

    • Firstname: A generated first name

    • Lastname: A generated last name

    • City: A generated city

  3. Specify the number of rows to generate using the context variable numRow, as shown in Figure 2. This helps you configure Jobs in Kubernetes.

     

    RowGenerator.png

     

    Context.png

     

You will find the Job Logrow 0.1 in the tlnd-job.zip file attached to this article.

 

Build

Before you can build a container image, you need to build the Job package. To build a Job, follow these steps:

  1. Right-click your Job, then select Build Job.

     

    Build-job.pngFigure 4: Build Job

  2. Keep the default configuration and click Finish.

     

    build-job-menu.png

     

Configuring the Dockerfile

Now that you have a zip file containing your batch process, you need to configure a Dockerfile before you can build the Docker image. For this example, the Dockerfile and all other mentioned files are in the tlnd-k8s.zip file attached to this article.

 

The Dockerfile is composed of three sections:

  • Arguments
  • Java download
  • Job configuration

 

Arguments

This section contains all the arguments needed to configure the build.

 

dockerfile-args.png

 

Java download

This section uses a multi-stage build, with a step dedicated to downloading a JRE. This download uses parameters from the arguments section.

 

dockerfile-java.png

 

Job configuration

This section explains how to build the container image for your Job. You can split this section into multiple parts:

 

  • Arguments: map some of the parameters defined in the first section of the Dockerfile

    dockerfile-job-args.png

     

  • Labels: allow you to define and document your image

    dockerfile-labels.png

     

  • Environment variables: provide information and help to configure the running process

    dockerfile-env.png

    In this example, the variable NUMROW allows you to configure the context variable numRow.

     

  • Installation: installs the JVM and your Job in a folder /opt/talend

    dockerfile-install.png

     

  • Run User: the process runs as the user talend

    dockerfile-user.png

     

  • Run command: the CMD runs the Job at each startup, using the environment variable NUMROW to overwrite the context variable

    dockerfile-cmd.png

     

Once your Dockerfile is configured, copy the Logrow_0.1.zip file you generated earlier to the same folder, as shown below.

docker-folder.png

 

Building the Docker image

  1. To build the image, run the following command in the folder where the Dockerfile and the zip file are located:

    docker build -t username/logrow:0.1.0 .

    Replace username with your Docker Hub username.

     

  2. At the end of the build process, you should see something like:

    Successfully built 6ef71caf6a90
    Successfully tagged username/logrow:0.1.0

     

  3. Test your images by running the following command:

    docker run --rm -i -e NUMROW=3 username/logrow:0.1.0

     

    docker-run3.png

     

    docker-run1.png

 

Pushing your Docker image

You need to push your images into a registry. This example uses a public Docker Hub. Replace username in the command below with your own username, or you will not be able to push your images.

docker push username/logrow:0.1.0

 

The second option, and the most common in many companies, is to use a private registry.

 

Configuring Minikube and Helm

 

Installation

This example simulates a Kubernetes cluster with Minikube, which runs a single node cluster hosted on a VirtualBox machine. To install Minikube see, Install Minikube on the Kubernetes documentation page.

 

Helm is a package manager for Kubernetes applications. It is very useful when you want to deploy multiple configurations as a single package. To install Helm see, Installing Helm on the Helm documentation page.

 

Initialization

To prepare the environment, you need to initialize Minikube and Helm.

 

Minikube

To initialize Minikube, type the following commands:

minikube start
minikube dashboard

 

You should see the Kubernetes dashboard:

 

kube-dashboard.png

 

Helm

To initialize Helm, type the following command:

helm init

 

You should see this after the run:

$HELM_HOME has been configured at /Users/username/.helm.
Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
Happy Helming!

 

Deploying your Job

You are ready to deploy your application. In this example, you are deploying a Kubernetes CronJob.

 

A cron job is a Job based on a schedule. To compare this to Talend Administration, it is a Job scheduled in the Job conductor.

 

This application is composed of two Kubernetes objects:

  • ConfigMap: a key/value object that contains the numRow context value

    This way, you are able to change the configuration of a Job without having to redeploy.

  • CronJob: contains your Talend Job container specifications

 

Understanding a Helm chart

A Helm chart is composed of multiple files that, once deployed, represent a release. In this example, you will deploy a package demo-job.

 

helm-folder.png

 

The demo-job folder contains the following:

  • Chart.yaml: represents the definition of a chart
    helm-chart.png

  • values.yaml: contains values for all variables used to configure your templates

     helm-values.png

 

In the folder templates, you have the configuration files for your Kubernetes objects:

  • configMap.yaml: creates a ConfigMap object that defines the value of the numRow variable. In the data section, use the key numrow to map to the container NUMROW environment variable.
    helm-configmap.png

  • cronJob.yaml: contains the definition of the cron job deployment, such as the container to use and the mapping of the NUMROW environment variable to the configMap variable.

    helm-cronjob.png

 

Deploying your Helm chart

  1. To deploy your chart, run the following command from inside the helm folder:

    helm install --name my-release --namespace talend ./demo-job

    This command contains:

    • --name my-release: configures the name of the release (replace my-release with desired release name)

    • --namespace talend: configures a new namespace called talend (replace as necessary)

    • ./demo-job: replace demo-job with the name of your package folder

     

    Result:

    NAME:   my-release
    LAST DEPLOYED: Mon Mar 26 16:32:57 2018
    NAMESPACE: talend
    STATUS: DEPLOYED
    
    RESOURCES:
    ==> v1/ConfigMap
    NAME                 DATA  AGE
    my-release-demo-job  1     0s
    ==> v1beta1/CronJob
    NAME                 KIND
    my-release-demo-job  CronJob.v1beta1.batch
    

     

  2. Verify that your objects were correctly deployed:

    kube-configMap.png

     

    kube-cronJob.png

     

One thing to understand when you deploy a Cron Job:

  • A Cron Job is a scheduled Kubernetes object that is a Job. For each execution of a Job, Kubernetes creates a pod. The pod is where the container is running, and where you will be able to find logs.

     

    kube-jobs.png

     

    kube-pods.png

     

    kube-logs.png

If you want to change the number of rows generated, go into ConfigMap and change the value of numrow in my-release-demo-job.

Version history
Revision #:
21 of 21
Last update:
‎02-25-2019 01:10 AM
Updated by: