Talend Cloud best practices

Introduction

This article provides best practices that Talend suggests you follow when using Talend Cloud.

 

Use Remote Engines for the Virtual Private Cloud environments

Follow Talend Git best practices for Talend Cloud

Test, run, or debug Jobs on Remote Engines directly from Studio (6.3, 6.4, 6.5 versus 7.0)

Design the job in Studio for Orchestration, Logging, and Restartability

Use Talend Cloud for notifications

Leverage Talend Cloud API to start, stop and get the status of executions

Upgrade Studio when Talend Cloud gets upgraded

Create a Shared workspace—same as the project—per environment for all the promotions

Be consistent with workspace and artifact names across your organization

 

Use Remote Engines for Virtual Private Cloud environments

As a best practice, use Remote Engines, instead of Cloud engines, for Virtual Private Cloud (VPC) environments. Whatever your VPC is, make sure you designate a remote engine instance with enough capacity to serve as a remote engine. You should not use Cloud engines.

 

Follow Git best practices for Talend Cloud

Use centralized workflows, create branches, and use tags as necessary. For more information about Git best practices, see the following resources:

 

Test, run, or debug Jobs on Remote Engines directly from Studio

Test, run, or debug Jobs on Remote Engines directly from Studio. Before Talend 7.0, pointing to a JobServer embedded in a remote engine meant manually configuring the remote execution in your Studio preferences. In Talend 7.0, Remote Engines declared as debugging engines are now automatically added in Studio. For more information on configuring this capability, see the Running or debugging a design on a Remote Engine from Talend Studio page of the Talend Cloud Data Integration Studio User Guide.

 

Design the Job in Studio for orchestration, logging, and restartability

There is basic orchestration when using the cloud, and execution plans in the cloud have the functionality for starting, stopping, and getting their status. You should use subJobs to orchestrate pipelines. You can load Cloud logs to an S3 bucket, and set up an ELK (Elasticsearch, Logstash, and Kibana) stack. In Studio you can use components such as tStatCatcher to load to an error table. Just like on-premises, use a central reusable way to do error handling and logging. Overall, you should design Jobs in Studio for restartability.

 

Use Talend Cloud for notifications

To set notifications, navigate to SettingsAdministration. Talend recommends using the predefined notification types.

 

Leverage Talend Cloud API to start, stop, and get the status of executions

You can use the Talend Cloud Public API (with a tool such as Swagger) to execute flows, get their status, and terminate Jobs. The Talend Summer ’18 release brings continuous delivery for Cloud development by making it possible to publish Cloud Jobs directly from Talend Studio using a Maven plug-in. This feature, which requires a Talend 7.0.1 Studio, lets you automate and orchestrate the full integration process by building, testing, and pushing Jobs to all Talend Cloud environments. Consult Talend documentation for more details.

 

Upgrade Studio when Talend Cloud is upgraded

It is always a best practice to upgrade Talend Studio whenever Talend Cloud is upgraded. Talend Studio is backward compatible (to a certain point), check the Talend Open Studio for Data Integration Getting Started Guide for details.

Note: Talend Cloud is no longer supported by Studio 6.2, in this case, upgrading Studio is recommended.

 

Create a Shared workspace—same as the project—per environment for all the promotions

Best practice recommends using both a Shared and a Personal workspace, and assigning one remote engine to each workspace.

  • As the name implies, a Personal workspace should be used purely by you.

  • Development teams should use Shared workspaces so that code can be centralized and shared. Ensure that the Shared workspace name is the same in all environments.

 

Be consistent with workspace and artifact names across your organization

As a final best practice, be consistent with the names of workspaces and artifacts across your organization. This is a simple, common best practice that should be adopted in all cases for all software applications. For example, a component name might look like this: component name_direction from/to_function. Choose your own standard, but be consistent. For more information on naming conventions, see Best Practice: Conventions for Object Naming.

Version history
Revision #:
21 of 21
Last update:
‎08-16-2018 03:43 PM
Updated by:
 
Labels (1)