One Star

Help on tHiveOutput

In Big Data Batch Spark jobs (but not Map Reduce) I see the component tHiveOutput. This component is not documented in the Help though.
I have a use case to insert into a number of partitioned Hive tables in Parquet format. I would like to understand this component's behaviour to see if this it is appropriate for my needs.
4 REPLIES
Moderator

Re: Help on tHiveOutput

Hi,
So far, the component reference of tHiveOutput is not available yet.
We can send it (pdf file) to you by email if you need.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Moderator

Re: Help on tHiveOutput

Hi,
We have already sent an email with tHiveOutput component reference(pdf file). Could you please check it?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Help on tHiveOutput

Many thanks. I have a couple of questions regarding the component if that's OK.

1. What is the reason that this component is only available in Spark Big Data jobs and not Map Reduce ?

2. It's good to see that it has a Parquet option (which is what my target table uses). Does that include Snappy compression?

3. Does the component support partitioned Hive tables? i.e. will it correctly write records into files in the correct HDFS directory structure according to the "partitioned by" clause in the DDL of the Hive tables ?

4. Does the component support bucketed Hive tables? i.e. will it correctly distribute the records across the buckets according to the "clustered by" clause in the DDL of the Hive tables ?

We are looking to use these features in the design of our Hive tables so I'm hoping that I can use Talend as a more elegant and efficient solution to transform and load my Hive tables, compared to using Hive SQL and INSERT INTO statements.
One Star

Re: Help on tHiveOutput

Hi Team,
I have installed Talend SandBox and trying to understand the job designs and components. I have questions on Big data batch Job design.
1.I am not seeing tHiveOutput and tHiveInput Components in Big Data Batch Job. If I want to read data from Hive tables, then do i need to use tJDBCInput Components only? 
2. I am not seeing Partitioners/Collectors in Big data Batch Job.
3.  Does Big data batch Job is converted in to Java and then executed on hadoop Cluster? and
     Does Spark Job code is convered in to scala and then executed on hadoop/spark cluster? Could you please confirm.