Generic Dockerfiles for Talend Jobs

Overview

This article explains how to create a generic Dockerfile that fits a standard use of Talend Jobs.

 

You will start with a Talend ETL Job. A Job is a standard Java application that requires some elements to execute, like parameters or a file system.

 

Sample Job

To demonstrate how to build a standard Dockerfile for a Talend Job, below is a very simple example Job.

sample_job.jpg

 

As you can see, there is nothing complex here.

 

Context

In this Job, you use two parameters, also known as context variables, to configure your Job.

  • nbrows: Number of rows generated by the component tRowGenerator
  • folder: Folder where your output file will be generated

 

Context Panel

context.jpg

 

tRowGenerator

trowgenerator.jpg

 

tFileOutputDelimited

Sans titre.png

 

So, you have two variables that will be configured at run time:

  • nbrows is a standard variable, so for a Docker container you can consider it an environment variable or command argument.
  • folder is also a standard variable, but it implies a storage folder. Remember, containers are ephemeral, so if you store this file without any configuration, it will be lost at the end. For this folder, you have to configure a volume.

 

  1. Build your Job. Keep the default configuration and the default archive name.

    Export.jpg

     

  2. Once generated, if you unzip it, you can see this file list:

    tree.jpg

     

  3. There are few elements and parameters you have to consider when building your Dockerfile.

    1. GenericJob_run.sh

      A shell script that allows you to run your Job. In this script, you have to pay attention to a few parameters for your Dockerfile:

      • --context=Default

        Defines the context as default. It can be customized, but this example bypasses it.

      • "$@"

        Accepts additional parameters from the command line.

        You have two ways of passing parameters:

        • Using arguments to override context parameters.
        • Using a volume file to change the Default.properties file. In this example, you use a volume file to change the configuration.
    2. Log4j.xml

      A file that allows you to change the log4j level in your Job and more.

 

Dockerfile

This is one example of a very generic Dockerfile you can create.

In the following example, I used mgainhao as my username. You can replace mgainhao with your own username.

 

FROM mgainhao/java:1.8

ARG talend_job=GenericJob
ARG talend_version=0.1

LABEL maintainer="mgainhao@talend.com" \
    talend.job=${talend_job} \
    talend.version=${talend_version}

ENV TALEND_JOB ${talend_job}
ENV TALEND_VERSION ${talend_version}
ENV ARGS ""

WORKDIR /opt/talend

COPY ${TALEND_JOB}_${talend_version}.zip .

### Install Talend Job
RUN yum install -y unzip && \
    unzip ${TALEND_JOB}_${TALEND_VERSION}.zip && \
    rm -rf ${TALEND_JOB}_${TALEND_VERSION}.zip && \
    chmod +x ${TALEND_JOB}/${TALEND_JOB}_run.sh

VOLUME /data

CMD ["/bin/sh","-c","${TALEND_JOB}/${TALEND_JOB}_run.sh ${ARGS}"]

 

FROM:

This example uses a Java image as a base image, but you can create your own (see the attached zip file).

FROM mgainhao/java:1.8

 

ARGS:

These arguments will be overridden at build time. They define the Job name and its version.

ARG talend_job=GenericJob
ARG talend_version=0.1

 

LABEL: (optional)

A label adds more information to your image.

LABEL maintainer="mgainhao@talend.com" \
    talend.job=${talend_job} \
    talend.version=${talend_version}

 

ENV:

These three environment variables help you install and run the container image.

ENV TALEND_JOB ${talend_job}
ENV TALEND_VERSION ${talend_version}
ENV ARGS ""

 

WORKDIR:

This changes the current directory, where binaries will be installed.

WORKDIR /opt/talend

 

COPY:

This copies your Job from your local directory to the image.

COPY ${TALEND_JOB}_${talend_version}.zip .

 

RUN:

This allows you to install your Job and make it executable.

### Install Talend Job
RUN yum install -y unzip && \
    unzip ${TALEND_JOB}_${TALEND_VERSION}.zip && \
    rm -rf ${TALEND_JOB}_${TALEND_VERSION}.zip && \
    chmod +x ${TALEND_JOB}/${TALEND_JOB}_run.sh

 

VOLUME:

This creates a volume if the Job needs to store some data.

VOLUME /data

 

CMD:

This specifies the command to run your application using environment variables.

CMD ["/bin/sh","-c","${TALEND_JOB}/${TALEND_JOB}_run.sh ${ARGS}"]

 

Building the Job Image

To build your image, run the following command:

docker build -t mgainhao/genericjob:0.1 --build-arg talend_job=GenericJob --build-arg talend_version=0.1 .

 

Running the container

To run a container, use the following command:

 

docker run --rm -ti -v /Users/mgainhao:/data -v /Users/mgainhao/log4j.xml:/opt/talend/GenericJob/log4j.xml -e ARGS="--context_param nbrows=10 --context_param folder=/data/"  mgainhao/genericjob:0.1

 

  • This part of the command allows you to change the log4j configuration with your own file:

    -v /Users/mgainhao/log4j.xml:/opt/talend/GenericJob/log4j.xml

     

  • This part of the command allows you to change the context configuration of your Job:

    -e ARGS="--context_param nbrows=10 --context_param folder=/data/"

 

Conclusion

Talend ETL Jobs are plain Java programs that can fit nicely with the idea of containers. This example showed a generic Dockerfile to build a Job, but in many cases, you will want to customize it.

 

Version history
Revision #:
17 of 17
Last update:
‎09-29-2018 12:11 AM
Updated by:
 
Labels (3)
Comments
Employee

Very good article and with easy to understand example :-)

 

Just thought to update one minor point that if we are using Ubuntu, we may get permission error while trying to write file from container to host. It can be avoided by using following command in host system.

 

su -c "setenforce 0" 

Thanks Michael for this article :-)

 

Warm Regards,

 

Nikhil Thampi