From Thursday, July, 9, 3:00 PM Pacific,
our Community site will be in
read-only mode
through Sunday, July 12th.
Thank you for your patience.

Quick Start Guide: Talend and Docker

Artha New logo 2017 PNG file_small.png

Article contributed by Artha Solutions (https://www.thinkartha.com/)

Author: Madhav Nalla

June 19, 2020

 

Enterprise deployment work is notorious for being hidebound and slow to react to change. With many organizations adopting Docker and container services, it becomes easy to incorporate their Talend deployment life cycle into their existing Docker and container services, creating a more unified deployment platform to be shared across various applications within an organization.

 

This article is intended as a quick start guide on how to generate Talend Jobs as Docker images using a Docker service that is on a remote host.

 

Also, to provide better understanding on handling Docker images, a few topics below are discussed by drawing comparisons between sh/bat scripts and Docker images.

 

Setting up your Docker for remote build

Talend Studio needs to connect to a Docker service to be able to generate a Docker image.

 

The Docker service can run on a machine where Talend Studio is installed, or it might be running somewhere on a remote host. This step is not needed if Docker is running on the same machine where Talend Studio is installed; this step is needed only if Talend Studio and Docker are running on different hosts.

 

Docker Remote Build

remoteBuild.png

 

Building a Docker Image from Talend Studio v7.1 or Greater

In v7.1, Talend introduced the Fabric 8 Maven plugin to generate a Docker image directly from Talend Studio.

 

Using Talend Studio, we can build a Docker image stored in a local Docker repository. Otherwise, we can build and publish a Docker image to any registry of our choice.

 

Let us look at both options:

 

Build the Docker Image from Talend Studio

  1. Right-click on the Job and navigate to the Build Job option:

    buildJob.png

     

  2. Under build type, select Docker Image:

    dockerImage.png

     

  3. Choose the appropriate context and log4h level.
  4. Under Docker Options, select local if Docker and Studio are installed on same host, or select Remote if your Docker service is running on a different host from the one where Talend Studio is installed. In our example, we enabled Docker for a remote build via TCP on port 2375:

    tcp://dockerhostIP:2375

    remoteBuild2.png

     

  5. Once this is done, your Docker image is built and stored in the Docker repository, in our example on host 2.
  6. Log in to the Docker host, in our example host 2, and execute the command docker images. You should be able to view the image we just built:

    dockerImage2.png

     

 

Build and Publish the Docker Image to the Registry from Talend Studio

Talend Studio can be used to build a Docker image, and the image can be published to any registry where the images can be picked up by Kubernetes or any container services. In our example, I have set up an AWS ECR registry.

  1. Right-click on the Job name and navigate to the Publish option.

    publish.png

     

  2. Select the Export Type Docker Image:

    dockerImage3.png

     

  3. Under Docker Options, provide the Docker host and port details as discussed in the previous topics. Give the necessary details of the registry and Docker image name:

    Image Name = Repository Name
    Image Tag=Jobname_Version
    Username = AccessKeyId (AWS)
    Password=Secret (AWS)

    pushJobImage.png

     

  4. Once this is done, navigate to AWS ECR and you should able to search and find the image:

    awsEcr.png

     

 

Running Docker Images vs Shell or Bat scripts

With Talend, we are all accustomed to either .SH or .Bat scripts, so for better understanding of how to run Docker images let’s cover various aspects, like how to pass run time parameters and volume mounting, in detail below.

 

Passing Run Time Parameters to a Docker Image

To run the Docker image that is in your Docker repository (Talend Build Job as Docker):

  1. List all the Docker Images by running the command docker images:

    dockerImage2.png

     

  2. Now I want to run the image madhav_tmc/tlogrow, Tag latest, which uses a tWarn component to print a message. Part of the message will be from the context variable param.

    tWarn.png

     

  3. Run the Docker image by passing a value to the context variable param at runtime:

    docker run madhav_tmc/tlogrow:latest \--context_param param="Hello TalendDocker"

    Below in the log, we can see the value passed to the Docker image at runtime:

    valuePassed.png

     

 

Volume Mounts to Docker run command at run time

When a Docker image is executed, it creates a container that is identical to a VM created on the host machine.

 

As containers are destroyed once the Job execution finishes, in order to be able to read or write a file by a Talend Job (Docker image), we need to mount the volumes used inside Jobs to volumes we want to actually be used by Jobs that are on the Host machines where the image will run.

 

For example, see the following Talend Job, which writes an out file to location “/mnt01/” +context.outfile:

contextOutfile.png

 

This Docker image is to be run on the host machine where my Docker is installed. If we run the image without passing the volume mount in the Docker Run command, then the Job finishes successfully but we cannot see any out file.

 

The mount command options are:

-v Actual Physical Drive: Container Drive

 

In our case:

  • Actual Physical Drive is “/dev” (the physical path where we intend the Talend Job to write the out file)
  • Container Drive is “/mnt01” (the drive used in the Talend Job becomes Container Drive)

 

Perform df -l on the machine where we intend to run the Docker image, and choose any of the mount paths, in our case /dev:

dev.png

 

So now we have:

-v /dev: /mnt01

 

If both mounts are the same, we can use the same value for both, for example:

-v /dev:/dev

 

The final Docker run command will look like this:

docker run -v /dev:/mnt01 madhav_tmc/tlogrow:latest \--context_param outfile="Talendoutdir/outfile.txt"

 

This creates an out file on the machine where we ran the Docker image, at /dev/Talendoutdir/outfile.txt.

 

Conclusion

With Talend, most of the implementations so far have been tightly coupled to a Talend ecosystem or have used 3rd party tools to execute Shell or Bat scripts. This Document covers the frequently used concepts during executing .SH or .Bat scripts.

 

With Talend providing seamless integration with Docker, it really makes it easier to use all the Docker run options during runtime to add more dynamic nature to the images.

 

For more information, write to: Solutions@thinkartha.com

Version history
Revision #:
3 of 3
Last update:
‎06-19-2020 04:08 PM
Updated by:
 
Labels (1)
Contributors