Talend Cloud and AWS System Manager Parameter Store for Talend context variables

Overview

AWS System Manager (SM), an AWS service, can be used to view and control infrastructures on AWS. It offers automation documents to simplify common maintenance and deployment tasks of AWS resources.

 

AWS SM consists of a collection of capabilities related to automation, such as infrastructure maintenance and deployment tasks of AWS resources as well as some related to Application Management and Configuration. Among them, is a capability called Parameter Store.

AWSystemManager.png

 

AWS System Manager Parameter Store

AWS Systems Manager (SM) Parameter Store provides secure, hierarchical storage for configuration data management and secrets management.

 

It allows you to store data such as passwords, database strings, and license codes as parameter values.

paramstore_empty.png

 

AWS SM Parameter Store benefits

Parameter Store offers the following benefits and features for Talend Jobs.

  • Secured, highly scalable, hosted service with NO SERVERS to manage: compared to the setup of a dedicated database to store Job context variables.

  • Control access at granular levels: specify who can access a specific parameter or set of parameters (for example, DB connection) at the user or group level. Using IAM roles, you can restrict access to parameters, which can have nested paths that can be used to define ACL-like access constraints. This is important for the control access of Production environment parameters.

  • Audit access: track the last user who created or updated a specific parameter value.

  • Encryption of data at rest and in transit: parameter values can be stored as plaintext (unencrypted data) or ciphertext (encrypted data). For encrypted value, KMS: AWS Key Management Service is used behind the scenes. Hence, Talend context variables with a Password type can be stored and retrieved securely without the implementation of a dedicated encryption/decryption process.

Another benefit of the AWS SM Parameter Store is its usage cost.

 

AWS SM Parameter Store pricing

AWS SM Parameter Store consists of standard and advanced parameters.

 

Standard parameters are available at no additional charge. The values are limited to 4 KB size, which should cover the majority of Talend Job use cases.

 

With advanced parameters (8 KB size), you are charged based on the number of advanced parameters stored each month and per API interaction.

ParameterStore_pricing2.png

 

Pricing example

Assume you have 5,000 parameters, of which 500 are advanced. Assume that you have enabled higher throughput limits and interact with each parameter 24 times per day, equating to 3,600,000 interactions per 30-day month. Because you have enabled higher throughput, your API interactions are charged for standard and advanced parameters. Your monthly bill is the sum of the cost of the advanced parameters and the API interactions, as follows: Cost of 500 advanced parameters = 500 * $0.05 per advanced parameter = $25 Cost of 3.6M API interactions = 3.6M * $0.05 per 10,000 interactions = $18 Total monthly cost = $25 + $18 = $43.

For more information on pricing, see the AWS Systems Manager pricing web site.

 

About parameters

A Parameter Store parameter is any piece of configuration data, such as a password or connection string, that is saved in the Store. You can centrally and securely reference this data in a Talend Job.

 

The Parameter Store provides support for three types of parameters:

  • String
  • String List
  • Secure String

 

Organizing parameters into hierarchies

In Talend, context variables are stored as a list of key-value pairs independent of the physical storage (Job, file, or database). Managing numerous parameters as a flat list is time-consuming and prone to errors. It can also be difficult to identify the correct parameter for a Talend Project or Job. This means you might accidentally use the wrong parameter, or you might create multiple parameters that use the same configuration data.

 

Parameter Store allows you to use parameter hierarchies to help organize and manage parameters. A hierarchy is a parameter name that includes a path that you define by using forward slashes (/).

 

The following example uses three hierarchy levels in the name:

/Dev/PROJECT1/max_rows

 

AWS SM Parameter Store with Talend Job

Parameter Store can accede from the AWS Console, AWS CLI, or the AWS SDK, including Java. Talend Studio leverage the AWS Java SDK to connect numerous Amazon Services, but, as yet, not to Amazon System Manager.

AmazonPallete.png

 

Implementation of AWS SM Parameter Store connector

This initial implementation solely uses the current capabilities of Studio, such as Routines and Joblets.

A future version will leverage the Talend Component Development Kit (CDK) to build a dedicated connector for AWS System Manager.

 

Routine

The connector was developed in Java using the AWS SDK and exported as an UberJar (single JAR with all his dependencies embedded in it).

 

The AWSSSMParameterStore-1.0.0.jar file (attached to this article) is imported into the Studio local Maven Repository and then used as a dependency in the AwsSSMParameterStore Talend routine.

add_routine_dep2.png

 

StudioRoutineWithDep.png

 

The routine provides a set of high-level APIs/functions of the Parameter Store for Talend Jobs.

package routines;

import java.util.Map;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import com.talend.ps.engineering.AWSSMParameterStore;

public class AwsSSMParameterStore {
	private static final Log LOG = LogFactory.getLog(AwsSSMParameterStore.class);
	private static AWSSMParameterStore paramsStore;

	/*
	 * init
	 *
	 * Create a AWSSMParameterStore client based of the credentials parameters.
	 * Follows the "Default Credential Provider Chain".
	 * See  https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html
	 *
	 * Parameters:
	 * 	accessKey : (Optional) AWS Access Key
	 * 	secretKey : (Optional) AWS Secret Key
	 * 	region    : (Optional) AWS Region
	 *
	 * Return:
	 *  Boolean : False  if invalid combination of parameters
	 */
	public static boolean init(String accessKey, String secretKey, String region) { ...
	}

	/*
	 * loadParameters
	 *
	 * Retrieve all the parameters recursively with the path a prefix in their name
	 *
	 * Parameters:
	 * 	path : Parameter path prefix for the parameters
	 *
	 * Return:
	 *  Map of name, value pair of parameters
	 */
	public static Map<String, String> loadParameters(String path){ ...
	}

	/*
	 * saveParameter
	 *
	 * Retrieve all the parameters recursively with the path a prefix in their name
	 *
	 * Parameters:
	 * 	name        : Name of the parameter
	 * 	value       : Value of the parameter
	 * 	encrypt     : Encrypt the value the value in the Parameter Store
	 *
	 * Return:
	 *  Boolean : False if the save failed
	 */
	public static boolean saveParameter(String name, Object value, boolean encrypt) { ...
        }

}

The init function creates the connector to AWS SSM using the AWS Default Credential Provider Chain.

 

The loadParameters function connects to the Parameter Store and retrieves a set/hierarchy of parameters prefixed with a specific path (see the naming convention for the parameters below).

 

The result is returned as a Map key-value pair.

 

Important: In the returned Map, the key represents only the last part of the parameter name path. If the parameter name is: /Dev/PROJECT1/max_rows, the returned Map key for this parameter is max_rows.

 

The saveParameter function allows you to save a context parameter name and value (derived from a context variable) to the Parameter Store.

 

Joblets

Two Joblets were developed to connect to the AWS Parameter Store through the routine. One is designed to initialize the context variables of a Job using the parameters from the AWS Parameter Store. The other, as a utility for a Job to store its context variables into the Parameter Store.

 

Joblet: SaveContextVariableToAwsSSMParameterStore

joblet_save_ctx_vars.png

 

The Joblet uses a tContextDump component to generate the context variables dataset with the standard key-value pair schema.

 

The tJavaFlex component is used to connect to the Parameter Store and save the context variables as parameters with a specific naming convention.

 

Parameter hierarchies naming convention for Talend context variables

In the context of context variables, the choice is to use a root prefix (optional) /talend/ to avoid any potential collision with the existing parameter name.

 

The prefix is appended with a string representing a runtime environment, for example, dev, qa, and prod. This to mimic the concept of the context environment found in the Job Contexts:

contextEnv.png

 

The parameter name is then appended with the name of the Talend Project (which is extracted from the Job definition) and, finally the name of the variable.

 

Parameter naming convention:

/talend/<environment name>/<talend project name>/<context variable name>

 

Example Job: job1 with a context variable ctx_var1 in a Talend Project PROJECT1.

 

The name of the parameter for the ctx_var1 variable in a development environment (identified by dev), is:

/talend/dev/PROJECT1/ctx_var1

 

For a production environment, prod, the name is:

/talend/prod/PROJECT1/ctx_var1

 

One option is to use the Job name as well in the hierarchy of the parameter name:

/talend/prod/PROJECT1/job1/ctx_var1

 

However, due to the usage of Talend Metadata connection, Context Group, and other that are shared across multiple Jobs, the usage of the Job name will result in multiple references of a context variable in the Parameter Store.

 

Moreover, if a value in the Context Group changes, the value needs to be updated in all the parameters for this context variable, which defies the purpose of the context group.

 

Joblet context variables

The Joblet uses a dedicated context group specific to the interaction with the Parameter Store.

  • AWS Access & Secret keys to connect to AWS. As mentioned earlier, the routine leverages AWS Default Credential Provider Chain. If these variables are not initialized, the SDK looks for Environment variables or the ~/.aws/Credential (user directory on Windows ) or EC2 roles to infer the right credentials.

  • AWS region of the AWS SM Parameter Store.

  • Parameter Store prefix and environment used in the parameter path as described above in the naming convention.

Joblet: LoadContextVariablesFromAwsSSMParmeterStore

The second Joblet is used to read parameters from The Parameter Store and update the Job context variables.

joblet_load_ctx_vars.png

 

The Joblet uses a tJavaFlex component to connect to SSM Parameter Store, leveraging the AwsSSMParameterStore.loadParameters routine function described above. It retrieves all the parameters based on the prefix path (see the defined naming convention above).

 

The tContextLoad use the tJavaflex output key-value pair dataset, to overwrite the default values of the context variables.

 

Joblet context variables

The load Joblet uses the same context group as the save counterpart.

 

Sample Talend Job

The sample Talend Job, generates a simple people's dataset using the tRowGenerator (first name, last name, and age), applies some transformations, and segregates the rows by age to create two distinct datasets, one for Adults ( age > 18) and one for Teenagers.

 

The two datasets are then inserted into a MySQL database in their respective tables.

Job1.png

 

The Job contains a mix of context variables, some are coming from a group defined for the MySQL Metadata Connection and some are specific to the Job: max_rows, table_adults, and table_teenagers.

 

Create Parameter Store entries for the context variables

The first step is to create all the parameters in the Parameter Store for the Job context variables. This can be done using the AWS console or through the AWS CLI, but those methods can be time-consuming and error-prone.

 

Instead, use the dedicated SaveContextVariableToAwsSSMParameterStore Joblet.

CreateParamStoreParams.png

 

You need to drag-and-drop the Joblet into the Job canvas. There is no need to connect it to the rest of the Job components. It lists all the context variables, connects to AWS SM Parameter Store, creates the associated parameters, and stops the Job.

 

When the Job is executed, the System Manager Parameter Store web console should list the newly created parameters.

parameterstore_list.png

 

On the AWS console, the first column is not resizable, to see the full name of a parameter, you'll need to hide some of the columns.

SSMPararmeters.png

 

You can also click a specific parameter to see the details.

parameterstore_detail.png

 

For context variables defined with a Password type, the associated parameter is created as SecureString, which allows the value to be encrypted at rest in the store.

 

parameterstore_password_detail.png

 

Talking about security, IAM access control can be leveraged to restrict access to a specific Operation team or to restrict access of a specific set of parameters such as production parameters: /talend/prod/*; developers will have access solely to the dev environment-related parameters, for example:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                // Allows to decrypt secret parameters
                "kms:Decrypt",
                "ssm:DescribeParameters"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "ssm:PutParameter",
                "ssm:LabelParameterVersion",
                "ssm:DeleteParameter",
                "ssm:GetParameterHistory",
                "ssm:GetParametersByPath",
                "ssm:GetParameters",
                "ssm:GetParameter",
                "ssm:DeleteParameters"
            ],
            // Grant access only to dev parameters
            "Resource": "arn:aws:ssm:AWS-Region:AWS-AccountId:parameter/talend/dev/*"
        }
    ]
}

 

Talend Cloud Job

In the context of a Talend Cloud Job/Task, the context variables don't need to be exported as connections or resources for Talend Cloud as they are initialized from the AWS Parameter Store.

 

You can only create a connection for the AWS SM Parameter Store credentials and config parameters.

 

Custom connection for AWS SM Parameter Store

The context group for the AWS SM Parameter Store, is externalized as Talend Cloud Custom Connection because, as yet, Talend Cloud doesn't have a native connector for AWS System Manager.

assm_contextgroup.png

 

TalendCloudCustomConnection.png

 

Talend Cloud Task

In Studio, you create a new Talend Cloud task by publishing the Job artifact to the cloud.

studio_publish_to_cloud.png

 

TalendCloudArtifact.png

You'll then add the custom connection for AWS SM.

cloud_task_with_connection.png

 

The additional context variables are exposed as advanced parameters, including the database connection parameters that are initialized from the Parameter Store.

TalendCloudTask.png

 

A successful task execution on a cloud or Remote Engine means that the Job can connect to AWS SM, retrieve the parameters based on the naming convention set above, and initialize the corresponding context variables to allows the Job to connect to the MySQL database and create the requested tables.

cloud_task_execution.png

Version history
Revision #:
11 of 11
Last update:
‎02-14-2020 03:15 PM
Updated by:
 
Contributors