Six Stars

Bigdata -Vertica integration.[HDFS to Vertica]

I have a below use case and need suggestion/optimized design ;
Have a source file 30-50GB size available in HDFS and i want to load this files in to Vertica. what is the best way to load the data from HDFS to Vertica ?
Tried using thiveinput-tmap-tverticaouput .. its very slow and throughout is 12Row/sec.
Tried TELTHive andTELTvertica - its not loading the data and no errors . [ but ELT will work with same DBtype and but when i see a option in talend to connect the components i tried it. but its not working
waiting for your suggestion/best approach.
5 REPLIES
Moderator

Re: Bigdata -Vertica integration.[HDFS to Vertica]

Hi,
You can use component TalendHelpCenter:tSqoopExport to call sqoop to transfer data from the Hadoop Distributed File System (HDFS) to a relational database management system (RDBMS).
For more information, please refer to the component reference.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Six Stars

Re: Bigdata -Vertica integration.[HDFS to Vertica]

Hi Sabrina,
Yes, I have tried that option. but I got the below error;
ERROR]: org.apache.sqoop.tool.BaseSqoopTool - Got error creating database manager: java.io.IOException: No manager for connect string: jdbc:vertica://host:5433/dbname
Note :i have hided host and dbname here.
For Vertica DB I don't see any JDBC option in metadata , hence I used built-in option and I placed the JDBC value.
One Star

Re: Bigdata -Vertica integration.[HDFS to Vertica]

Hi Team,
I have a scenario where have to create a job to extract files from FTP(zip files .csv.tar.gz) and load to Hive tables.
Some logics also to add in this flow. could u please suggest a best flow to create such a talend BigData job.
FTPconnection->(on subjob ok)-->tfilelist-->(iterate)-->tUnarchive(ftp path)
FTPlist-->(iterate)-->tHDFSput-->(iterate)-->tHiveLoad.
also is it necessary to give Hadoop properties everytime while establishing a HDFS connection. What is the use of it.?
Moderator

Re: Bigdata -Vertica integration.[HDFS to Vertica]

Hi,
So far, talend don't support for transferring data by air.
From your job reqirement, you have to get your files from FTP(zip files .csv.tar.gz)into local firstly and then put them into hive table. 
You can use the tHiveCreateTable to create a table within Hive if the table doesn't exist yet and then use the tHiveLoad to load the local delimited file into your Hive table.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Six Stars

Re: Bigdata -Vertica integration.[HDFS to Vertica]

Hi Sabrina and Sam,
for this same scenario, i designed the job as below.    the main reason is the input file is really big 300GB to 500GB.  So i don't want to download in local and do load in Hive.
tFTPList-->Tssh   TFTP list is to connect FTPserver  and use TSSH to execute the HDFS commands to place the file in HDFS 
once the file is available , used hive create table and hive load connector to load the same data.
the performance also good .
Hope this will help Sam. 
For Connection  i always use Connection component (eg hive connection or VerticaConnector) once and i reuse it for all other components. once of the best practice from Talend community.