Configuration of tSqoopimport in CommandLine and Java API mode

Talend Version (Required)       6.3.1

Summary

Configuration of tSqoopimport in CommandLine and Java API mode to transfer data from a relational DB to HDFS.
Additional Versions Cloudera 5.7, Talend 6.3.1.20161216_1026
Product (Required) Talend Data Fabric
Component (Required) tSqoopImport
Problem Description How-to make tSqoopImport work in Commandline and API mode.
Problem root cause N/A
Solution or Workaround

Server side preparation

  1. Add database drivers into the /var/lib/sqoop folder.

    sqoop.png

     

  2. Run a script on the Hadoop server side to ensure that sqoop has been installed and is configured properly, and there are no connection problems with the relational database that stores the data you're copying (substitute your own values as appropriate).

    Script to import from Sybase to target HDFS directory

    sqoop import --connect "jdbc:sybase:Tds:xxx.xxx.xx.xxx:8000/MYSYBASE?" --username "name" --password "myPassword" --target-dir /user/cloudera/T3/importDir --table "name" --split-by id --num-mappers 1 --driver com.sybase.jdbc4.jdbc.SybDriver

    Script to import from Mysql to a target HDFS directory

    sqoop import --connect |jdbc:mysql://localhost:3306/test” --username “root” --password “123456” --table “person” --delete-target-dir

    Note: add --append if you want to add data into the target HDFS; if it conflicts, add --delete-target-dir.

     

Studio side configuration

CommandLine mode

  1. To use CommandLine mode, select it on the tSqoopImport Component tab:

    p2.png

     

  2. Add --driver and --connection-manager to the Advanced Settings tab:

    p3.png

 

Java API mode:

  1. To use Java API mode, select it instead of CommandLine mode on the tSqoopImport Component tab, and set the Driver JAR and Class name as shown below:

    p4.png

     

  2. Add jdbc.driver.class : com.sybase.jdbc4.jdbc.SybDriver in the Additional Arguments field of the tSqoopImport Advanced Settings tab.

    p6.png

     

  3. If desired, you can add MapReduce parameters to optimize performance in API mode, in the Job Memory Parameters fields.

    p7.png

JIRA ticket number TBD-5314
Version history
Revision #:
8 of 8
Last update:
‎10-16-2017 05:45 PM
Updated by: