Log management and monitoring in Talend Cloud

Introduction

Logs are an important aspect of any production system. Logs can help you pinpoint the root cause of issues in any application, and gathering all this crucial data into a single dashboard makes productivity rise substantially. Some of the benefits of log management and monitoring are:

  • Centralize log data

  • Improve system performance

  • Time-efficient monitoring

  • Efficient issue troubleshooting

  • Assess application health

  • Diagnose and identify runtime errors

This document describes the various ways and methods of how log management and monitoring can be done in Talend Cloud, for example, leveraging Elasticsearch, Logstash, and Kibana (ELK) and Talend Activity Monitoring Console (similar to on-premises in Talend Administration Center).

 

Methods of logging

Talend Cloud offers two types of logs:

  • Execution logs: the task and plan logs provide details about the execution, such as messages, severity, timestamps, and versions. These logs can be used to analyze and debug tasks or plans.

  • Engine logs: can be configured to give technical details at INFO, WARN, ERROR, or FATAL log level.

These logs are accessible in several ways. This article shows two examples for redirecting the logs to Filebeat and Java Message Service (JMS). However, you can redirect the Remote Engine logs to the tool of your choice, like Splunk or Datadog.

 

Viewing and searching the Job logs from Talend Management Console

  1. Go to OPERATIONS.

  2. Filter on the environment/workspace/operator/time period at the top of the page.

    image001.jpg

     

  3. Select the corresponding status of the execution to retrieve the logs for the following:

    • Current

    • Failed

    • Rejected

    • Terminated

    • Successful

  4. Expand the task execution to view details and possible actions.

    image002.jpg

     

  5. Click VIEW LOGS.

    image003.jpg

     

  6. Download the logs from Talend Management Console.

    1. Go to OPERATIONS.

    2. Filter on the environment/workspace/operator/time.

    3. Select the task execution to download and VIEW LOGS.

    4. Select DOWNLOAD > All to download all the logs independent of the filter selection.

      image004.jpg

       

    5. Select DOWNLOAD > Filter results to only download the results displayed for the current filters.

      image005.jpg

       

    6. To download a single log, click the Download icon above the log content.

      image006.jpg

       

Logs in Talend Remote Engines

  • Every time a task/plan is executed from Talend Management Console, a check happens on the Remote Engine to see if the executables are available.

  • If the binary exists, then it is executed; otherwise, the Remote Engine extracts the binary from the Repository. The binaries are stored in <Remote engine installation directory>\TalendJobServersFiles\archiveJobs.

    image007.jpg

     

  • The execution logs are stored on the Remote Engine server in <remote engine installation directory>\TalendJobServersFiles\jobexecutions\logs.

    image008.jpg

     

  • Each Job execution log folder has two files:

    • StdOutErr: this file contains log details corresponding to the console. The details from this file are retrieved and displayed in Talend Management Console logs.

    • Resuming: this file contains details of the events such as JOB_STARTED, SYSTEM_LOGS, and JOB_ENDED.

      image009.jpg

       

  • Talend Remote Engine utilizes Apache Karaf containers. You can utilize these Karaf logs to get the details about your infrastructure like Remote Engine, Remote Engine Cluster, and Remote Engine connection to Talend Cloud. These logs are located under <remote engine installation directory>\data\log.

    image010.jpg

     

Saving execution logs to the Amazon S3 bucket

  1. Currently, this feature is only available for Talend Cloud on AWS.

  2. For more information on how to configure the bucket, see Saving execution logs to an external Amazon S3 bucket in the Talend Help Center.

    Note: Saving logs to external AWS S3 buckets is the best way to gather and analyze Job logs, especially if you are using cloud engines.

  3. Go to CONFIGURATION > MANAGEMENT CONSOLE.

  4. Enable EXPORT LOGS.

    image011.jpg

     

  5. On the Management Console Export Logs page, click the Cloud Formation template link to download the Talend Cloud AWS CloudFormation template.

    image012.jpg

     

  6. Open your AWS account in a new tab, then start the Create Stack wizard on the AWS CloudFormation Console. In the Specify template section, select Upload a template to Amazon S3, then select the template provided by Talend Cloud.

    image013.jpg

     

  7. Define the External ID, S3 bucket name, and S3 prefix.

    image014.jpg

     

  8. Click Create. The stack is created. Copy the Role ARN key value from the Outputs tab.

    image015.jpg

     

  9. Enter the details (Role ARN, External Id, and Bucket Name) from the steps above into the Talend Management Console.

    image016.jpg

     

  10. Notice that the logs are saved to the S3 bucket.

    image017.jpg

     

Redirecting the logs using Filebeat

Filebeat is a lightweight collector for forwarding and centralizing log data. Installed as an agent on your servers, Filebeat monitors the log files or locations that you specify, collects log events, and forwards them to either Elasticsearch or Logstash for indexing.

 

Filebeat can be used to redirect the Remote Engine logs to Elasticsearch.

 

There are numerous ways to redirect the logs. The previous section showed you how to redirect the logs to an S3 bucket. This section shows another category where Filebeat can be configured to redirect the on-premises Elasticsearch. However, you can setup Filebeat or another collector like App Metrics or Telegraf, according to your needs.

 

  1. Download Filebeat from https://www.elastic.co/fr/downloads/beats.

    image018.jpg

     

    For more information, see the Getting Started with Filebeat documentation.

  2. Edit the filebeat.yml configuration file to reflect the log files.

    - type: log
     # Change to true to enable this input configuration.
     enabled: true
     Paths that should be crawled and fetched. Glob based paths.
     paths:
        #- /var/log/*.log
         - C:\TalendRemoteEngine-240\TalendJobServersFiles\jobexecutions\logs\*\*
     
    output.elasticsearch:
      # Array of hosts to connect to.
      hosts: ["localhost:9200"]
     
    setup.kibana:
      # Kibana Host
      # Scheme and port can be left out and will be set to the default (http and 5601)
      # In case you specify and additional path, the scheme is required: http://localhost:5601/path
      # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
      host: "localhost:5601"
  3. Install Filebeat by running the following commands:

    image019.jpg

     

  4. To set up an index from Elasticsearch, use the following commands to create a template, set up a dashboard, and start Filebeat as service.

    image020.jpg

     

  5. Review the commands and their results in the Filebeat_commands file attached to this article.

    image021.png

     

  6. Start Filebeat so that the logs are sent to your Elasticsearch engine.

    image022.jpg

     

Redirecting the logs to JMS

  1. JMS is an API that provides the facility to create, send, and read messages. It provides loosely coupled, reliable, and asynchronous communication. JMS is also known as a messaging service.

  2. Remote Engine logs can be routed to JMS. Once the logs are in JMS, they can be read and processed into another system like ELK.

  3. Edit the remote-engine-installation/etc/org.talend.eventlogging.sender.jms.cfg file located in your Remote Engine installation folder.

    Your configuration should look similar to this:

    image023.jpg

     

  4. Update the destination.jms.url parameter to your JMS broker, in this case, localhost.

    image024.jpg

     

  5. Login to the ActiveMQ admin console. You should see a new queue, in this case, logging.server.

    image025.jpg

     

  6. Open the logs to validate that the logs are the same as those on the Remote Engine.

    image026.jpg

     

Studio Logs

Talend Studio can be configured to store Job execution, statistics, performance information, and detailed technical logs. As described earlier in this article, these logs are extra information that can be enabled. The logs can be stored in delimited files or database tables.

 

The Job executions are monitored using three files or tables:

  • Collection of logs

  • Component statistics

  • Data flow volume

  1. To store this data, you need to create three files or database tables, respectively, using the schema of the tLogCatcher, tStatCatcher, and tFlowMeterCatcher components (available in the Palette of your Talend Studio).

    Note: For more information, see Creating files or database tables in the Talend Help Center.

  2. To collect these logs, the Stats & Logs settings at the project-level, or the tStatsCatcher, tLogCatcher, and tFlowMeterCatcher components in individual Job need to be enabled.

    1. To enable at the project-level, select File > Edit Project Properties > Project Settings. Expand the Job Settings node, then select Stats & Logs. Configure it either to a file or DB.

      image027.jpg

       

    2. To enable at the Job-level, select the Job tab, then click Stats & Logs.

      image028.jpg

       

  • You can use the tStatCatcher, tLogCatcher, and tFlowMeterCatcher components as needed in the Job and link it to the relevant output (file or DB). After the data is available in the file or DB, you can use the Talend Activity Monitoring Console to analyze the data.

  • Also, you can create your own dashboards or ingest the metric, detail stats/logs into your own tool. A relevant example is described in the Activity Monitoring Console using a visualization tool section of this article.

    image029.jpg

     

Methods of Monitoring

Now that you have seen the various ways to store Talend Logs, take a look at the different monitoring methods. You can monitor the Talend execution, engine, and project logs in one or all of the following:

 

Kibana dashboard

  • A Kibana dashboard is a collection of visualizations, searches, and maps, typically in real-time. Dashboards provide at-a-glance insights into your data and enable you to drill-down into the details.

  • After you have the logs stored in your Elasticsearch engine, utilize the indexes to create a dashboard in Kibana.

    image030.jpg

     

  • You can create as many templates as needed to monitor the Jobs, status, time details, and more.

 

Activity Monitoring Console using a visualization tool

  • Talend Activity Monitoring Console is an add-on tool integrated into Studio for monitoring Talend Jobs and projects.

  • After Studio is configured, use the Activity Monitoring Console (as described in the Studio Logs section of this article), to visualize the data.

  • The Activity Monitoring Console can only connect to the configured DB/File. It cannot connect to a Remote Engine or any other types of Logs. Thus it would only show the exact metrics, logs, stats that are generated and loaded into the DB/File.

  • Talend Activity Monitoring Console helps Talend product administrators or users to achieve improved process performances through a convenient graphical interface and a supervising tool.

  • As the logs are stored in either DB/File, plugging it to an external visualization tool like Tableau or MicroStrategy, becomes very easy.

  • The following example shows two reports:

    • Error message by the components in a Job:

      image031.jpg

       

    • Success/Failure for each Job in 2019 along with the execution details:

      image032.jpg

       

Activity Monitoring Console in Studio

  • The Activity Monitoring Console dashboards can be accessed using Studio.

    image033.jpg

     

  • You can access the reports dashboard in the Activity Monitoring Console perspective.

    image034.jpg

     

    For more information, see Accessing the monitoring console from the Studio available in the Talend Help Center.

  • The following views can be created in the DB.

    • Jobs view

    • History and Detailed history views

    • Meter log view

    • Main chart view

    • Job Volume view

    • Logged Events view

    • Error report view

    • Threshold Charts view

 

Appendices

 

ELK with Talend Cloud

This blog is available at https://www.talend.com/blog/2019/09/09/elk-with-talend-cloud/

 

Introduction to Talend Activity Monitoring Console

Available in the Talend Help Center at https://help.talend.com/reader/aDkYRyut3Bh1Oo4YpBffEg/KtKEN~LTk82alp6_uaMrsg

 

Redirecting Talend Cloud logs to a JMS

Available in the Talend Community Knowledge Base (KB) at https://community.talend.com/t5/Design-and-Development/Redirecting-Talend-Cloud-logs-to-a-JMS/ta-p/1...

 

Conclusion

In summary, the various logging and monitoring capabilities offered by Talend are:

  • Viewing logs in Talend Management Console
  • Accessing the Remote Engine logs and pushing them to ELK
  • Routing Remote Engine logs to Logstash/Elasticsearch using Filebeat
  • Accessing Job logs from the AWS S3 bucket and pushing them to ELK (very useful if cloud engines are being used)
  • Enabling statistics on Job execution (tStatCatcher, tLogCatcher, tFlowMeterCatcher) at project-level or Job-level then saving it to DB/Files (this data can be utilized using the Activity Monitoring Console in Talend Studio)
  • Enabling data from DB/File for the Activity Monitoring Console to be visualized using external tools such as Tableau and MicroStrategy
  • Routing the Remote Engine logs using JMS
  • Routing the Remote Engine logs using Filebeat
Version history
Revision #:
11 of 11
Last update:
‎05-13-2020 08:37 AM
Updated by:
 
Contributors
Comments
Employee

Excellent article!!!