Language translation using Microsoft Translator Text API and Talend

Overview

This article shows how Talend integrates with Microsoft Translator Text API, a cloud-based translation service, to convert the text in one language to a specified target language.

 

Translator Text API calls are made using an https connection, which in turn uses SSL encrypted protocol so that data is 2048-bit RSA encrypted end to end. The API is also compliant with many certificates like CSA STAR, FedRAMP, and GDPR, that provide added security and require compliance.

 

Talend recommends reviewing the list of Microsoft Translator supported languages.

 

Sources for the project are available in the attached Zip file.

 

Prerequisites

 

Job Design

 

Setting up routine and JAR dependencies

  1. Download and extract the Microsoft_Azure_Language_Translation.zip file (attached to this article).

  2. Open Talend Studio.

  3. In the Repository, expand Code, right-click Routines, then select Create routine.

    1.png

     

  4. In the pop-up window, enter Translate in the Name text box. Fill in the Purpose and Description text boxes. Click Finish.

    2.png

     

  5. Replace the predefined template code with the contents of Translate.java file (attached to this article), then press CTRL+S to save the routine. Close the routine.

    3.png

     

  6. Right-click the Translate routine, then select Edit Routine Libraries. The Import External Library window opens.

    4.png

     

  7. Click New to add a new external JAR file. The Module window opens.

    editroutinelibnew.PNG

     

  8. Select Artifact repository(local m2/nexus), then select Install a new module. Click [...] and browse to the okhttp-2.5.0.jar file located in the jar_files folder.

    6.png

     

  9. Click Detect the module install status, then click OK.

  10. Repeat the Steps 6 - 8 to add the okio-1.6.0.jar file. Click Finish.

    7.png

     

Importing the Job

  1. In the Repository view, navigate to Job Designs > Standard. Right-click Standard, then select Import items.

    8.png

     

  2. Select the Select archive file option, then click Browse and navigate to the Microsoft_Azure_Lang_Translation.zip file.

  3. Click Select All, then click Finish.

    9.png

     

Configuring the Job

  1. In the Repository view, right-click the Microsoft_Azure_Lang_Translation Job, then select Setup Routine dependencies.

    15.png

     

  2. Click the green + sign to add a routine. In the Select Routines window, select the Translate routine. Click OK.

    16.png

     

  3. Click OK in Setup Routine dependencies window.

    17.png

     

  4. Double-click the Microsoft_Azure_Lang_Translation Job to open it.

    10.png

     

  5. Double-click the InputFile component to open the Basic settings view. In the File name/Stream field enter the path language.csv file or click the [...] button and browse to the file. The language.csv file is the input file for this Job; it contains eight lines of text, each in a different language, that will be translated into a target language.

    11.png

     

  6. Double-click the LangCodeLookupFile component to open the Basic settings view. In the File name/Stream field enter the path to the Language_mapping.txt file or click the [...] button and browse to the file. Microsoft Translator API always returns a two-character code for any source language detected. This file is used to decode the two-character value to full name.

    12.png

     

  7. In the Job, switch to the Contexts view. In the Value field for the subscription_key context variable, enter the subscription key value you received when you registered for the Microsoft Translator Text API service.

  8. By default, the target_language context variable is “en” (English). This variable determines the output target language used in the translation of the language.csv file. If you want to change it to another language, open the Language_mapping.txt file and get the corresponding two-character code for the language of your choice, then change the target_language context variable value.

    13.png

     

Running the Job

  1. Run the Job.

  2. Review the output file and verify that the languages translated as expected.

    14.png

     

Conclusion

This article showed you how Talend integrates seamlessly with Microsoft Translator Text API to translate data in multiple languages to English.

Version history
Revision #:
17 of 17
Last update:
‎08-02-2019 08:47 AM
Updated by:
 
Comments
Employee

@pmanjunath

 

Good one Pratheek ! Waiting for more interesting Microsoft service integration stories :-)

Employee

@nikhilthampi  Thanks Nikhil for your encouragement, as always Smiley Happy Sure one of the true power of Talend is integrating with many technologies and tools.