Four Stars

Sqoop Export (tsqoopexport) and HCatalog don't integrate

I'm using TOS for Big Data 6.2.1. I need to export records (as result of a query or new/updated records in Hive) from Hive to MySQL. My tables in Hive were created with ORC format and Zlib compression. I tried using the tSqoopExport but I see some limitations in working with HCatalog. Here is the same command that works in Hive shell (please note that the command below would export an entire table):
sqoop export \
--connect jdbc:mysql://db_host:3306/employeesdb \
--driver com.mysql.jdbc.Driver \
--username admin \
--password admin \
--table ms_employeestable \
--enclosed-by '\"' \
--hcatalog-database employees_db \
--hcatalog-table employees_table;
I'm using the option 'Use Java API' in tSqoopExport as I'm running the command on a remote HDFS cluster. In the Advanced settings tab (under Components of tSqoopExport) I cannot find any argument that could address HCatalog: I'm using the document in the tSqqopExport specs for Talend 6.2.1 and HCatalog is not listed.
Is there any workaround you would recommend? 
Thanks in advance.
4 REPLIES
Six Stars

Re: Sqoop Export (tsqoopexport) and HCatalog don't integrate

Hi, Apriore.
I have the same problem. In Talend, there's tHCatalogInput which is suppose to do that.
When I run it on a RC Hive table, it doesn't get me the text, just Gibrish.
Any Ideas?
Beside that, in the tSqoopExport advanced tag, you can check the "use speed parallel" and then "use additional params" and put
in free text "--hcatalog-database employees_db" but it'll put in before the --connect string, so I don't know if it'll work
Thanks,
Four Stars

Re: Sqoop Export (tsqoopexport) and HCatalog don't integrate

Please note that I specified that I had to select the option 'Use Java API', not the 'Use Commandline' option (with this last option you have to deploy and run the Job in the host where Sqoop is installed, and that's not my case - or a common case). If you select 'Use Java API' you need to specify a 'Java API mode', so the additional parameter that you suggested '--hcatalog-database employees_db' would not work.
Hopefully including HCatalog in tSqoopExport (that is a very common use case) will be fixed in the next release. I also tried workarounds using tHCatalogInput connector but with no success.
Four Stars

Re: Sqoop Export (tsqoopexport) and HCatalog don't integrate

Below is the error I get when I run tSqoopExport in CommandLine mode:
Exception in component tSqoopExport_1
java.io.IOException: Cannot run program "sqoop": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at java.lang.Runtime.exec(Runtime.java:620)
at java.lang.Runtime.exec(Runtime.java:485)
at bigdata_demos.sqoop_mysql_to_hive_0_1.sqoop_mysql_to_hive.tSqoopExport_1Process(sqoop_mysql_to_hive.java:521)
at bigdata_demos.sqoop_mysql_to_hive_0_1.sqoop_mysql_to_hive.tHDFSConnection_1Process(sqoop_mysql_to_hive.java:356)
at bigdata_demos.sqoop_mysql_to_hive_0_1.sqoop_mysql_to_hive.runJobInTOS(sqoop_mysql_to_hive.java:819)
at bigdata_demos.sqoop_mysql_to_hive_0_1.sqoop_mysql_to_hive.main(sqoop_mysql_to_hive.java:676)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
at java.lang.ProcessImpl.start(ProcessImpl.java:137)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 6 more
Basically the system cannot find Sqoop in my Hadoop cluster (because it's probably not running on the same host).
Four Stars

Re: Sqoop Export (tsqoopexport) and HCatalog don't integrate

The solution to the error 'Cannot run program "sqoop": CreateProcess error=2, The system cannot find the file specified' is to run the job using a JobServer: this allows you to run the job on a remote server, where sqoop is installed. Under Run > Target Exec, you should be able to run your job remotely; only problem is that on Talend Open Studio you get the message "The target execution tab is only available with the JobServer package." that means with the subscription version (go to : Window > Preferences > Talend > Run/Debug and configure 'Remote'. Or you implement your JobServer...