Generic Dockerfiles for Talend Jobs

Overview

This article explains how to create a generic Dockerfile that fits a standard use of Talend Jobs.

 

You will start with a Talend ETL Job. A Job is a standard Java application that requires some elements to execute, like parameters or a file system.

 

Sample Job

To demonstrate how to build a standard Dockerfile for a Talend Job, below is a very simple example Job.

sample_job.jpg

 

As you can see, there is nothing complex here.

 

Context

In this Job, you use two parameters, also known as context variables, to configure your Job.

  • nbrows: Number of rows generated by the component tRowGenerator
  • folder: Folder where your output file will be generated

 

Context Panel

context.jpg

 

tRowGenerator

trowgenerator.jpg

 

tFileOutputDelimited

Sans titre.png

 

So, you have two variables that will be configured at run time:

  • nbrows is a standard variable, so for a Docker container you can consider it an environment variable or command argument.
  • folder is also a standard variable, but it implies a storage folder. Remember, containers are ephemeral, so if you store this file without any configuration, it will be lost at the end. For this folder, you have to configure a volume.

 

  1. Build your Job. Keep the default configuration and the default archive name.

    Export.jpg

     

  2. Once generated, if you unzip it, you can see this file list:

    tree.jpg

     

  3. There are few elements and parameters you have to consider when building your Dockerfile.

    1. GenericJob_run.sh

      A shell script that allows you to run your Job. In this script, you have to pay attention to a few parameters for your Dockerfile:

      • --context=Default

        Defines the context as default. It can be customized, but this example bypasses it.

      • "$@"

        Accepts additional parameters from the command line.

        You have two ways of passing parameters:

        • Using arguments to override context parameters.
        • Using a volume file to change the Default.properties file. In this example, you use a volume file to change the configuration.
    2. Log4j.xml

      A file that allows you to change the log4j level in your Job and more.

 

Dockerfile

This is one example of a very generic Dockerfile you can create.

In the following example, I used mgainhao as my username. You can replace mgainhao with your own username.

 

FROM mgainhao/java:1.8

ARG talend_job=GenericJob
ARG talend_version=0.1

LABEL maintainer="mgainhao@talend.com" \
    talend.job=${talend_job} \
    talend.version=${talend_version}

ENV TALEND_JOB ${talend_job}
ENV TALEND_VERSION ${talend_version}
ENV ARGS ""

WORKDIR /opt/talend

COPY ${TALEND_JOB}_${talend_version}.zip .

### Install Talend Job
RUN yum install -y unzip && \
    unzip ${TALEND_JOB}_${TALEND_VERSION}.zip && \
    rm -rf ${TALEND_JOB}_${TALEND_VERSION}.zip && \
    chmod +x ${TALEND_JOB}/${TALEND_JOB}_run.sh

VOLUME /data

CMD ["/bin/sh","-c","${TALEND_JOB}/${TALEND_JOB}_run.sh ${ARGS}"]

 

FROM:

This example uses a Java image as a base image, but you can create your own (see the attached zip file).

FROM mgainhao/java:1.8

 

ARGS:

These arguments will be overridden at build time. They define the Job name and its version.

ARG talend_job=GenericJob
ARG talend_version=0.1

 

LABEL: (optional)

A label adds more information to your image.

LABEL maintainer="mgainhao@talend.com" \
    talend.job=${talend_job} \
    talend.version=${talend_version}

 

ENV:

These three environment variables help you install and run the container image.

ENV TALEND_JOB ${talend_job}
ENV TALEND_VERSION ${talend_version}
ENV ARGS ""

 

WORKDIR:

This changes the current directory, where binaries will be installed.

WORKDIR /opt/talend

 

COPY:

This copies your Job from your local directory to the image.

COPY ${TALEND_JOB}_${talend_version}.zip .

 

RUN:

This allows you to install your Job and make it executable.

### Install Talend Job
RUN yum install -y unzip && \
    unzip ${TALEND_JOB}_${TALEND_VERSION}.zip && \
    rm -rf ${TALEND_JOB}_${TALEND_VERSION}.zip && \
    chmod +x ${TALEND_JOB}/${TALEND_JOB}_run.sh

 

VOLUME:

This creates a volume if the Job needs to store some data.

VOLUME /data

 

CMD:

This specifies the command to run your application using environment variables.

CMD ["/bin/sh","-c","${TALEND_JOB}/${TALEND_JOB}_run.sh ${ARGS}"]

 

Building the Job Image

To build your image, run the following command:

docker build -t mgainhao/genericjob:0.1 --build-arg talend_job=GenericJob --build-arg talend_version=0.1 .

 

Running the container

To run a container, use the following command:

 

docker run --rm -ti -v /Users/mgainhao:/data -v /Users/mgainhao/log4j.xml:/opt/talend/GenericJob/log4j.xml -e ARGS="--context_param nbrows=10 --context_param folder=/data/"  mgainhao/genericjob:0.1

 

  • This part of the command allows you to change the log4j configuration with your own file:

    -v /Users/mgainhao/log4j.xml:/opt/talend/GenericJob/log4j.xml

     

  • This part of the command allows you to change the context configuration of your Job:

    -e ARGS="--context_param nbrows=10 --context_param folder=/data/"

 

Conclusion

Talend ETL Jobs are plain Java programs that can fit nicely with the idea of containers. This example showed a generic Dockerfile to build a Job, but in many cases, you will want to customize it.

 

Version history
Revision #:
16 of 16
Last update:
‎03-20-2018 01:02 PM
Updated by:
 
Labels (2)
Contributors