This article explains how to schedule a Talend Job as a Kubernetes Job and how to use Kubernetes as a Job orchestrator.
Sources for this project are available in the attached Zip files.
Create a Talend Job to generate random data and display it in the output console.
Configure the tRowGenerator component as follows:
Identifier: A random value for each row
Firstname: A generated first name
Lastname: A generated last name
City: A generated city
Specify the number of rows to generate using the context variable numRow, as shown in Figure 2. This helps you configure Jobs in Kubernetes.
You will find the Job Logrow 0.1 in the tlnd-job.zip file attached to this article.
Before you can build a container image, you need to build the Job package. To build a Job, follow these steps:
Right-click your Job, then select Build Job.
Keep the default configuration and click Finish.
Now that you have a zip file containing your batch process, you need to configure a Dockerfile before you can build the Docker image. For this example, the Dockerfile and all other mentioned files are in the tlnd-k8s.zip file attached to this article.
The Dockerfile is composed of three sections:
This section contains all the arguments needed to configure the build.
This section uses a multi-stage build, with a step dedicated to downloading a JRE. This download uses parameters from the arguments section.
This section explains how to build the container image for your Job. You can split this section into multiple parts:
Arguments: map some of the parameters defined in the first section of the Dockerfile
Labels: allow you to define and document your image
Environment variables: provide information and help to configure the running process
In this example, the variable NUMROW allows you to configure the context variable numRow.
Installation: installs the JVM and your Job in a folder /opt/talend
Run User: the process runs as the user talend
Run command: the CMD runs the Job at each startup, using the environment variable NUMROW to overwrite the context variable
Once your Dockerfile is configured, copy the Logrow_0.1.zip file you generated earlier to the same folder, as shown below.
To build the image, run the following command in the folder where the Dockerfile and the zip file are located:
docker build -t username/logrow:0.1.0 .
Replace username with your Docker Hub username.
At the end of the build process, you should see something like:
Successfully built 6ef71caf6a90 Successfully tagged username/logrow:0.1.0
Test your images by running the following command:
docker run --rm -i -e NUMROW=3 username/logrow:0.1.0
You need to push your images into a registry. This example uses a public Docker Hub. Replace username in the command below with your own username, or you will not be able to push your images.
docker push username/logrow:0.1.0
The second option, and the most common in many companies, is to use a private registry.
This example simulates a Kubernetes cluster with Minikube, which runs a single node cluster hosted on a VirtualBox machine. To install Minikube see, Install Minikube on the Kubernetes documentation page.
Helm is a package manager for Kubernetes applications. It is very useful when you want to deploy multiple configurations as a single package. To install Helm see, Installing Helm on the Helm documentation page.
To prepare the environment, you need to initialize Minikube and Helm.
To initialize Minikube, type the following commands:
minikube start minikube dashboard
You should see the Kubernetes dashboard:
To initialize Helm, type the following command:
You should see this after the run:
$HELM_HOME has been configured at /Users/username/.helm. Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster. Happy Helming!
You are ready to deploy your application. In this example, you are deploying a Kubernetes CronJob.
A cron job is a Job based on a schedule. To compare this to Talend Administration, it is a Job scheduled in the Job conductor.
This application is composed of two Kubernetes objects:
ConfigMap: a key/value object that contains the numRow context value
This way, you are able to change the configuration of a Job without having to redeploy.
CronJob: contains your Talend Job container specifications
A Helm chart is composed of multiple files that, once deployed, represent a release. In this example, you will deploy a package demo-job.
The demo-job folder contains the following:
Chart.yaml: represents the definition of a chart
values.yaml: contains values for all variables used to configure your templates
In the folder templates, you have the configuration files for your Kubernetes objects:
configMap.yaml: creates a ConfigMap object that defines the value of the numRow variable. In the data section, use the key numrow to map to the container NUMROW environment variable.
cronJob.yaml: contains the definition of the cron job deployment, such as the container to use and the mapping of the NUMROW environment variable to the configMap variable.
To deploy your chart, run the following command from inside the helm folder:
helm install --name my-release --namespace talend ./demo-job
This command contains:
--name my-release: configures the name of the release (replace my-release with desired release name)
--namespace talend: configures a new namespace called talend (replace as necessary)
./demo-job: replace demo-job with the name of your package folder
NAME: my-release LAST DEPLOYED: Mon Mar 26 16:32:57 2018 NAMESPACE: talend STATUS: DEPLOYED RESOURCES: ==> v1/ConfigMap NAME DATA AGE my-release-demo-job 1 0s ==> v1beta1/CronJob NAME KIND my-release-demo-job CronJob.v1beta1.batch
Verify that your objects were correctly deployed:
One thing to understand when you deploy a Cron Job:
A Cron Job is a scheduled Kubernetes object that is a Job. For each execution of a Job, Kubernetes creates a pod. The pod is where the container is running, and where you will be able to find logs.
If you want to change the number of rows generated, go into ConfigMap and change the value of numrow in my-release-demo-job.