Talend white paper Self-Service Talend Migration shows you how to migrate your Talend on-premise to Talend Cloud. Highlights include; ensuring success by planning and assessment, Talend project audit, and potential pitfalls.
The best practice depends on if you’re already using Talend Cloud extensively.
If you have developed a lot of Jobs, the best practice is not to do anything - at least in the short term. Talend Cloud is backward compatible with most Studio versions (currently to 6.5). So even if Studio is on an older version, Jobs will work.
The same applies to Remote Engines. For example, you can still use the old 2.5 Remote Engine though the latest one is version 2.8. However, Talend recommends that you upgrade. Look for the times when the project is stable or when you're starting a new project.
If you haven’t developed a lot of Jobs and are at release 6.5 or newer, or if you really want to use a new feature, you're welcome to upgrade Studio and Remote Engines.
For more information, see Talend Support Statements.
After you upgrade the Git/GitHub repository, it cannot revert back to the old version, so you need to duplicate (copy or clone) your Git/GitHub repository to a new repository. Then, if there are any issues, you can always fall back to the old project and on-premises Talend.
Use the repository copy to create a new project in Talend Cloud TMC. The Git/GitHub URL will be downloaded to Studio at the first connection. The repository will be automatically upgraded at this time. Individual Jobs may need a tweak or two. For example, any deprecated components should be replaced.
To upgrade a Remote Engine, simply install and pair it. The new Remote Engine can be installed on the same compute resource as the existing resource. However, if any changes to the default configurations (memory, parallel executions) have been made, they need to be applied to the new Remote Engine.
When using UNC paths in Talend Cloud, observe the following:
When hard-coding the UNC path within a file component, the format is "\\\\networkdrive\\folderA\\folderB\\file.txt" (quoted-string)
When using Context Variable and setting the type to Directory, then the path format is \\networkdrive\folderA\folderB\file.txt (backslashes do not need escaping)
When using Context "connection _" Variable, the corresponding connection in the cloud environment is \\\\networkdrive\\folderA\\folderB\\file.txt (no quotes) for the value
To facilitate a smooth migration from on-premises to Talend Cloud, Studio context variables should conform to Talend Cloud connection parameter naming standards. This is for context variables used for connections to external data systems. These can then be stored in context metadata and shared with multiple Talend Jobs.
Context variables to be used for connections should use the following pattern:
Where connection_ is a fixed string, <conntype> is the type of connection, and <param> is the parameter that matches one of the connection component parameters (for example, userid, password, and server). <conntype> can be anything.
For more information, see the list of Talend Cloud out-of-the-box supported connections.
Jobs that run in TAC require double quotes for string values in context variables. However, Talend Cloud strings must not use double quotes. The following example shows the two sets of values for the context variable fileName. One conforms to Jobs run in TAC, the other to Jobs run in Talend Cloud.
Double-quote errors in the Talend Cloud execution log are similar to this:
If your on-premises project is several releases behind the latest Talend Cloud version, you may want to compile all your Jobs to uncover any migration issues. If you have many Jobs in your project, Talend recommends that you consider installing and deploying Continuous Integration for Talend Cloud to automate the compile. Note any compile failures and take corrective action.
Common causes of compile failures in an on-premises migration include:
Incompatible Java versions
Old database drivers
External sourced components
Talend recommends that you perform analysis on your on-premises Talend projects and Jobs to anticipate any migration challenges, and to reduce risk.
Remote Engine and Runtime servers can be sized very similarly to Remote JobServers. However, you have to take the following into consideration:
Memory and CPU requirements of the Jobs you're going to run
All are Java, so there there can be great variety in run unit size
Simultaneous Job execution requirements, such as how many Jobs are required to be running at the same time
Based on this: (highest Job RAM requirement X max count of simultaneous Jobs) + 25% (or 33% or 50% -- pick a fudge factor) = initial sizing.
Be sure to account for the overhead of the Operating System, plus the memory requirements of the Talend Cloud Remote Engine and optionally the Talend Runtime.
An easier alternative, and recommended best practice is to resize your compute resource on the fly. As compute resources are practically a commodity, changing size after the initial go-live is easy.
Start with a 4 CPU and 16 GB RAM configuration.
Monitor your Jobs.
Scale the compute node up or down from there as your load requires.