Talend Remote Engine on AWS Terraform module implementing Auto Scaling Group

Overview

The Terraform module implements the AWS Auto Scaling Group functionality to dynamically scale a number of Talend Remote Engine client EC2 instances based the on AWS Marketplace Talend Cloud Remote Engine for AWS AMI or a personal Remote Engine Installation AMI and paired them with their corresponding Talend Cloud Remote Engine, created at the same time.

IAM.png

 

Features

Each Remote Engine client EC2 instance is configured with a dedicated EC2 User Data script that invokes the Talend Cloud REST API to create the corresponding Talend Cloud Engine and cluster when the EC2 instance is instantiated for the first time.

 

A second script is triggered when the EC2 instance is terminated to unpair and delete the corresponding cloud Remote Engine.

If a Talend Cloud cluster name (input variable talend_cloud.cluster_name) is set, the cluster is used (or created, if it doesn't exist) to group the created Remote Engines.

 

If a cluster already exists, the Talend Cloud workspace and environment of the cluster are used to create the Remote Engine.

 

If a cluster name is not provided or the name doesn't exist, the input variable, talend_cloud.workspace_name, must be set to create the Talend Cloud Remote Engine and cluster.

 

The number of EC2 AMI Remote Engine client instances initially instantiated is based on the requested desired capacity (input variable remote_engine.desired), then scaled in or out based on the EC2 CPU and Memory thresholds defined in the input variables group (scaling_thresholds).

 

Requirements

  • Terraform version >= 0.12
  • An existing AWS VPC (input variable vpc_id)
  • See Generating a Personal Access Token, in the Talend Cloud API Designer User Guide, for the Talend Cloud REST API calls (input variable talend_cloud.api_pat)

 

Usage

 

With required parameters

provider "aws" {
  ...
}

module "talend_re" {
  source = "./modules/talend-remote-engine-aws-autoscaling"

  vpc_id           = "<vpc id>"

  talend_cloud = {
    region         = "<Talend Cloud Region: 'US', 'EU' or 'AP'"
    api_pat        = "<Talend Cloud User Personal Access Token>"

    cluster_name   = "<Talend Cloud cluster name>"
    re_name_prefix = "<Talend Cloud Engine name prefix>"
    workspace_name = "<Talend Cloud workspace>"
  }

  remote_engine = {
    name_prefix = "<Remote Engine EC2 instance name prefix>"
    min         = <Minimum Number of RE instances the autoscaling group must preserved>
    desired     = <Initial desired capacity of Remote Engine instances>
    max         = <Maximum Number of RE instances the autoscaling group cannot exceed>
  }
}

 

With all the parameters

provider "aws" {
  ...
}

module "talend_re" {
  source = "./modules/talend-remote-engine-aws-autoscaling"

  vpc_id            = "vpc-123456"
  additional_sg_ids = ["subnet-12345", "subnet-67890"]
 
  ssh_cidr_blocks   = ["172.10.0.0/24"]
  ssh_key_name      = ""
  ssh_key_file_path = var.ssh_key_file_path

  talend_cloud_api_version = "1.3"

  talend_cloud = {
    region         = "US"
    api_pat        = var.api_pat

    cluster_name   = "cluster_asg"
    re_name_prefix = "re_asg"
    workspace_name = "Default"
  }

  talend_remote_engine_ami_version = "2.5.0"

  remote_engine_ami_id = ""
  remote_engine_instance_type = "t2.xlarge"

  root_block_device = {
    volume_type           = "gp2"
    volume_size           = "50"
    delete_on_termination = true
  }

  remote_engine = {
    name_prefix = "re_asg"
    min         = 1
    desired     = 2
    max         = 4
  }

  scaling_thresholds = {
    min_cpu         = 10
    max_cpu         = 75

    min_memory      = 10
    max_memory      = 80
  }
}

 

Variables

  • additional_sg_ids - (Optional) List of Security Group IDs to add to the dedicated one created for the Remote Engine (see AWS Security below). Default: []

  • remote_engine_ami_id - (Optional) AMI ID of a Remote Engine EC2 Image to use instead of the Official Talend AMI. It cannot be used with talend_remote_engine_ami_version. Default: ""

  • remote_engine_install_folder - (Optional) Installation folder of the Remote Engine. Used primarily for the custom AMI remote_engine_ami_id. Default: /opt/talend/ipaas/remote-engine-client

  • remote_engine_instance_type - (Optional) EC2 instance type for the Remote Engine. Default: t2.medium

  • remote_engine - Remote Engine EC2 instance configuration:

    • name_prefix - (Optional) Name prefix used tag the created EC2 instances

    • min - (Required) Minimum of Remote Engine EC2 instances the AWS AutoScaling Group must maintain

    • max - (Required) Maximum of Remote Engine EC2 instances the AWS AutoScaling Group cannot exceed

    • desired - (Required) Number of Remote Engine EC2 instances the AWS AutoScaling need to instantiate initially

  • root_block_device (Optional) Configuration of the root EBS volume:

    • volume_type - (Optional) EC2 volume type of the Remote Engine: Default: gp2

    • volume_size - (Optional) EC2 volume_size (in GB) of the Remote Engine: Default: 30

    • delete_on_termination - (Optional) Whether the volume should be destroyed on instance termination: Default: true

  • scaling_thresholds (Optional) Configuration of the auto-scaling threshold metrics:

    • min_cpu - (Optional) EC2 instance CPU minimum usage threshold (in percentage) to trigger a scale-in (decrease of the number of EC2 instances). Default: 20

    • max_cpu - (Optional) EC2 instance CPU maximum usage threshold (in percentage) to trigger a scale-out (increase of the number of EC2 instances). Default: 80

    • min_memory - (Optional) EC2 instance Memory minimum usage threshold (in percentage) to trigger a scale-in (decrease of the number of EC2 instances). Default: 20

    • max_memory - (Optional) EC2 instance Memory maximum usage threshold (in percentage) to trigger a scale-out (increase of the number of EC2 instances). Default: 80

  • ssh_cidr_blocks - (Optional) List of CIDR blocks addresses allowed to have ssh access to the Remote Engine's instances. Default: []

  • ssh_key_name - (Optional) Existing SSH key pair name to use. It cannot be used with ssh_key_file_path.

  • ssh_key_file_path - (Optional) SSH public key file path to upload to create a new key pair. It cannot be used with ssh_key_name.

  • subnet_ids - (Optional) List of Subnet IDs where a Remote Engine instance are deployed. If not provided, all the available subnets of the provided vpc is used. Default []

  • talend_cloud_api_version - (Optional) Talend Cloud API Version. Default: 1.3

  • talend_cloud - Configuration parameters for Talend Cloud:

    • region - (Required) Talend Cloud region. Possible values: 'US', 'EU', 'AP'.

    • api_pat - (Required) Talend Cloud User Personal Access Token for API calls. For more information, see Generating a Personal Access Token in the Talend Help Center.

    • cluster_name - (Optional) Talend Remote Engine Cluster name. If it doesn't exist and workspace_name is provided, the cluster is created. For more information, see Creating Remote Engine Clusters in the Talend Help Center.

    • re_name_prefix - (Optional) Talend Remote Engine name prefix. The name is appended to the EC2 instance ID. The generated name is unique because an EC2 instance ID is unique on AWS.

    • workspace_name - (Optional) Talend Cloud workspace. Must already exist in Talend Cloud. For more information, see Creating workspaces in the Talend Help Center.

    • studio_cidr_blocks - (Optional) CIDR blocks addresses of Studio workstations/client. Allows Studio to connect the Remote Engine by exposing the public IP of the Remote Engine EC2 instance. For more information, see Assigning Remote Engines to Clusters in the Talend Help Center.

  • talend_remote_engine_ami_version - (Optional) Version of the official Talend Remote Engine AMI. It cannot be used in conjunction with remote_engine_ami_id. Default: 2.5.0

  • vpc_id - (Required) VPC ID where the Remote Engine instance (or instances) is provisioned.

Outputs

  • re_ssh_key_name: SSH key name, if ssh_key_file_path is set.

 

Examples

provider "aws" {
  ...
}

module "talend_re" {
  source = "./modules/talend-remote-engine-aws-autoscaling"

  vpc_id           = "vpc-0bcacdc740e504209"

  talend_cloud = {
    region         = "US"
    api_pat        = var.api_pat

    cluster_name   = "cluster_asg"
    re_name_prefix = "re_asg"
    workspace_name = "redha_asg"
  }

  remote_engine = {
    name_prefix = "re_asg"
    min         = 1
    desired     = 2
    max         = 4
  }
}
  • The Remote Engine client EC2 instances are distributed among all the subnets of the existing VPC, vpc-0bcacdc740e504209, as the input variable, subnet_ids, is net set.

    vpc.png

    In this case, for resiliency, each subnet is located in different AWS Availability Zones.

    subnets.png

     

  • No SSH access to the EC2 instances as ssh_cidr_blocks is empty.
  • The associated Talend Cloud Remote Engine is located in the US region (https://portal.us.cloud.talend.com).
  • The Remote Engines are grouped in the Talend Remote Engine Cluster cluster_asg. The workspace and environment are inferred from the existing cluster definition.
  • The Remote Engine names are prefixed with re_asg. The name is the concatenation of the prefix and the EC2 instance ID. In AWS, an EC2 Instance ID is unique, making the Remote Engine name unique as well.

    cloud_re.png

     

  • Talend Cloud Remote engine description contains AWS EC2 metadata to help with the location (on AWS) of the corresponding EC2 instance.

    cloud_re_def.png

    Description: AWS Metadata: public IP: 18.237.51.226 - vpcId: vpc-0bcacdc740e504209 - subnetId: subnet-0c2ff4d83355aa808 - instanceId: i-099f76b45f427d9f5

     

  • The Auto Scaling group is configured with a Desired number of instances set to 2, 1, and 4 for Min and Max respectively.

    asg.png

    A list of Scaling Policies is set based on the default scaling_thresholds, for example, Max CPU threshold of 80 percent.

    scaling_policies.png

    Their respective CloudWatch Alarms are:

    cloudwatch.png

     

 

Auto Scaling Group in Action

  • The desired capacity is set to 2, and the min is set to 1.

  • The default scaling threshold for low CPU is set to 20% by default.

  • No workload/Tasks have been deployed to the Remote Engines for more than 60 sec (limit arbitrary set).

  • After one minute, the min-cpu-threshold CloudWatch Alarm is triggered:

    alarm_min_cpu.png

    alarm_low_cpu.png

     

  • The associated Scaling Policy is activated, and it terminated one of the Remote Engine running instances.

    activity_scale_in.png

     

  • Because the minimum of instances is set to 1, a prolonged non-activity does not terminate the last instance.

  • On Talend Cloud, when the instance i-0494c5741e0cbe26c is scheduled for termination, it is unpaired and deletes the corresponding Talend Cloud Remote Engine before exiting, leaving the cluster with the remaining Remote Engine.

    cluster_after_scale_in.png

     

 

Talend Studio support

Talend Cloud allows you to make a specific number of Remote Engines available to Talend Studio for the developers to test their Jobs. For more information, see Executing Artifacts on a Remote Engine from Talend Studio.

 

To be visible to Studio, the Remote Engines must not be part of a Remote Engine Cluster, meaning talend_cloud.cluster_name must be empty and the talend_cloud.studio_cidr_blocks must be set.

 

If those conditions are met, the public IP Address of the Remote Engine EC2 is set automatically by the Terraform module.

re_for_studio.png

 

The Remote Engines is then visible to Talend Studio:

studo_re_list.png

 

Important: The official Talend Remote Engine AMI version 2.5.0 does not support Talend Studio access. The requested ports (8003, 8004, and 8891 (see below)) are not accessible outside the EC2 Instance. A custom AMI with a manual installation and configuration must be built for this purpose. This AMI can be used with this module by setting the input variable remote_engine_ami_id.

 

AWS Security Groups

The module creates a dedicated security group for the Remote Engine EC2 instances with the following egress and ingress rules. Additional external Security Groups can be attached to the Remote Engine EC2 instances by listing them in the input variable additional_sg_ids.

 

Egress rule

The rule limits external connection connections to port 443 (HTTPS) used by the list of Talend Cloud URLs:

URL Port Usage
update.talend.com 443 For downloading additional packages such as Bonita BPM Integration, Talend Metadata Bridge and upgrades from Talend Studio tools
talend-update.talend.com 443 For downloading libraries in Talend Studio (mainly for components)
www.talend.com 443 For testing and sending usage statistics from Talend Studio
talendforge.org 443 For user actions, such as clicking Community links, and others
help.talend.com 443 For user actions, such as clicking on help links, and others

 

Ingress rules

 

SSH Access

An optional security group rule on port 22 is created if the input variable ssh_cidr_blocks is set. Firewall rule limits SSH access to the list of CDIR blocks listed.

 

Studio Access

If the input variable studio_cidr_blocks is set, additional security rules are created to allow Talend Studio to interact with the Remote Engine for deploying and executing Talend Job.

Port Protocol CIDR blocks Usage
8003 tcp studio_cidr_blocks Command Port
8004 tcp studio_cidr_blocks File Transfer Port
8891 tcp studio_cidr_blocks Monitoring Port

 

Troubleshooting

 

A Talend Cloud Engine is not created for the EC2 instance

To help diagnose the root cause of the issue, you can access the user data script execution log of the EC2 instance using the AWS Console Get System Log menu.

system_log.png

 

Scroll to the end of the log and search the lines starting with re-initialization.

 

Or use the AWS CLI command:

aws ec2 get-console-output --instance-id $instance_id | jq -r .Output | grep re-initialization

re-initialization: ===========================================
re-initialization: === Talend Remote Engine Initialization ===
re-initialization: ===========================================
re-initialization: >> Talend Cloud API Parameters:
re-initialization:    - cloud_region     = us
re-initialization:    - cloud_api_version= 1.3
re-initialization:    - cloud_api_url    = https://api.us.cloud.talend.com/tmc/v1.3
re-initialization:    - cloud_pairing_url= https://pair.us.cloud.talend.com
re-initialization:    - cloud_api_pat    = 0Zzm****0o43
re-initialization:
re-initialization: >> Script Parameters:
re-initialization:    - cluster_name  =
re-initialization:    - re_name_prefix= "tf_re"
re-initialization:    - ws_name       = "Redha_WS1"
re-initialization:    =================================
re-initialization:
re-initialization: >> Installing jq
re-initialization:    - Done
re-initialization:
re-initialization: >> Search for workspace 'Redha_WS1':
re-initialization:    - Found
re-initialization:
re-initialization: >> Create Remote engine 'tf_re_i-076b0642d9d24a916'
re-initialization:    - Remote engine created successfully.
re-initialization:
re-initialization: >> Remote engine:
re-initialization:    - name     = 'tf_re_i-076b0642d9d24a916'
re-initialization:    - id       = '5dec5de5a8302932c367c8a8'
re-initialization:    - ip
re-initialization:      - public = '18.237.88.211'
re-initialization:      - private= '172.20.0.116'
re-initialization:    - key      = '5AC4125D882D652E3DA33AEC65F011283A4D22979F85A6C7F398414DF095C5B8'
re-initialization:    - desc     = 'AWS Metadata: public IP: 18.237.88.211 - vpcId: vpc-07c83fdab0aaa0134 - subnetId: subnet-01ff816cbff190036 - instanceId: i-076b0642d9d24a916'
re-initialization:
re-initialization: >> Pre-authorized file '/opt/talend/ipaas/remote-engine-client/etc/preauthorized.key.cfg' updated:
re-initialization: remote.engine.pre.authorized.key = 5AC4125D882D652E3DA33AEC65F011283A4D22979F85A6C7F398414DF095C5B8
re-initialization: remote.engine.name = tf_re_i-076b0642d9d24a916
re-initialization: remote.engine.description =
re-initialization:
re-initialization: Created symlink from /etc/systemd/system/multi-user.target.wants/talend-re-termination.service to /etc/systemd/system/talend-re-termination.service.
re-initialization: >> End: Exec time: 5 sec
```

 

Important: The System log is available only after the instance is fully initialized, and it may take some time before it is accessible after that.

 

A Talend Cloud Engine is not deleted when EC2 instance is terminated

Here, as well, the EC2 System log can be leveraged for investigation. After the termination, AWS doesn't remove the terminated instance right away. During this grace period, the System log can be accessed from the AWS Console, as mentioned above or by using the following AWS CLI command:

aws ec2 get-console-output --instance-id $instance_id | jq -r .Output | grep re-termination

re-termination: ========================================
re-termination: === Talend Remote Engine Termination ===
re-termination: ========================================
re-termination: >> Talend Cloud API Parameters:
re-termination:    - cloud_region     = us
re-termination:    - cloud_api_version= 1.3
re-termination:    - cloud_api_url    = https://api.us.cloud.talend.com/tmc/v1.3
re-termination:    - cloud_pairing_url= https://pair.us.cloud.talend.com
re-termination:    - cloud_api_pat    = 0Zzm****0o43
re-termination:
re-termination: >> Script Parameters:
re-termination:    - cluster_name  =
re-termination:    - cluster_id    =
re-termination:    - re_name       = "tf_re_i-076b0642d9d24a916"
re-termination:    - re_id         = "5dec5de5a8302932c367c8a8"
re-termination:    =================================
re-termination:
re-termination: >> Deleting re id '5dec5de5a8302932c367c8a8'
re-termination:    - Remote engine deleted successfully.
Version history
Revision #:
12 of 12
Last update:
‎02-10-2020 10:50 AM
Updated by:
 
Contributors
Comments
Employee

A note on the POST API for creating Remote Engines : POST /runtimes/remote-engines

 

On the parameter runProfiles you can actually use one of these 3 parameters to create a Remote Engine for DI/Big Data jobs, Microservice, or Talend Runtime respectively.

 

runProfiles  : [JOB_SERVER, MICROSERVICE, TALEND_RUNTIME]

if empty - JOB_SERVER will be set by default 

One Star

What about thins Terraform code? Can we check it out?

"./modules/talend-remote-engine-aws-autoscaling"