Article contributed by Artha Solutions (https://www.thinkartha.com/)
Author: Madhav Nalla
June 19, 2020
Enterprise deployment work is notorious for being hidebound and slow to react to change. With many organizations adopting Docker and container services, it becomes easy to incorporate their Talend deployment life cycle into their existing Docker and container services, creating a more unified deployment platform to be shared across various applications within an organization.
This article is intended as a quick start guide on how to generate Talend Jobs as Docker images using a Docker service that is on a remote host.
Also, to provide better understanding on handling Docker images, a few topics below are discussed by drawing comparisons between sh/bat scripts and Docker images.
Talend Studio needs to connect to a Docker service to be able to generate a Docker image.
The Docker service can run on a machine where Talend Studio is installed, or it might be running somewhere on a remote host. This step is not needed if Docker is running on the same machine where Talend Studio is installed; this step is needed only if Talend Studio and Docker are running on different hosts.
In v7.1, Talend introduced the Fabric 8 Maven plugin to generate a Docker image directly from Talend Studio.
Using Talend Studio, we can build a Docker image stored in a local Docker repository. Otherwise, we can build and publish a Docker image to any registry of our choice.
Let us look at both options:
Right-click on the Job and navigate to the Build Job option:
Under build type, select Docker Image:
Under Docker Options, select local if Docker and Studio are installed on same host, or select Remote if your Docker service is running on a different host from the one where Talend Studio is installed. In our example, we enabled Docker for a remote build via TCP on port 2375:
Log in to the Docker host, in our example host 2, and execute the command docker images. You should be able to view the image we just built:
Talend Studio can be used to build a Docker image, and the image can be published to any registry where the images can be picked up by Kubernetes or any container services. In our example, I have set up an AWS ECR registry.
Right-click on the Job name and navigate to the Publish option.
Select the Export Type Docker Image:
Under Docker Options, provide the Docker host and port details as discussed in the previous topics. Give the necessary details of the registry and Docker image name:
Image Name = Repository Name Image Tag=Jobname_Version Username = AccessKeyId (AWS) Password=Secret (AWS)
Once this is done, navigate to AWS ECR and you should able to search and find the image:
With Talend, we are all accustomed to either .SH or .Bat scripts, so for better understanding of how to run Docker images let’s cover various aspects, like how to pass run time parameters and volume mounting, in detail below.
To run the Docker image that is in your Docker repository (Talend Build Job as Docker):
List all the Docker Images by running the command docker images:
Now I want to run the image madhav_tmc/tlogrow, Tag latest, which uses a tWarn component to print a message. Part of the message will be from the context variable param.
Run the Docker image by passing a value to the context variable param at runtime:
docker run madhav_tmc/tlogrow:latest \--context_param param="Hello TalendDocker"
Below in the log, we can see the value passed to the Docker image at runtime:
When a Docker image is executed, it creates a container that is identical to a VM created on the host machine.
As containers are destroyed once the Job execution finishes, in order to be able to read or write a file by a Talend Job (Docker image), we need to mount the volumes used inside Jobs to volumes we want to actually be used by Jobs that are on the Host machines where the image will run.
For example, see the following Talend Job, which writes an out file to location “/mnt01/” +context.outfile:
This Docker image is to be run on the host machine where my Docker is installed. If we run the image without passing the volume mount in the Docker Run command, then the Job finishes successfully but we cannot see any out file.
The mount command options are:
-v Actual Physical Drive: Container Drive
In our case:
Perform df -l on the machine where we intend to run the Docker image, and choose any of the mount paths, in our case /dev:
So now we have:
-v /dev: /mnt01
If both mounts are the same, we can use the same value for both, for example:
The final Docker run command will look like this:
docker run -v /dev:/mnt01 madhav_tmc/tlogrow:latest \--context_param outfile="Talendoutdir/outfile.txt"
This creates an out file on the machine where we ran the Docker image, at /dev/Talendoutdir/outfile.txt.
With Talend, most of the implementations so far have been tightly coupled to a Talend ecosystem or have used 3rd party tools to execute Shell or Bat scripts. This Document covers the frequently used concepts during executing .SH or .Bat scripts.
With Talend providing seamless integration with Docker, it really makes it easier to use all the Docker run options during runtime to add more dynamic nature to the images.
For more information, write to: Solutions@thinkartha.com