One Star

SCD implementation in hive/hbase using Talend

Hi friends i am searching about the possibility and the documentation required for implementing slowly changing dimensions in hadoop cluster (hive/hbase). i didnt find any material that can help me. can anyone of you guide me as of how can i implement SCD in hive/hbase using talend as my integration tool. Thanks in advance

12 REPLIES
One Star

Re: SCD implementation in hive/hbase using Talend

I'm actually working on a use case for doing SCD on Hive. It leverages the ELTHive components. I'll share once I'm done in a day or so...
One Star

Re: SCD implementation in hive/hbase using Talend

Hi, thanks for the response. that would be great. i will be waiting for your response. i am trying my part by coding.,but i want to make use of talend. so i will really appreciate if you can share it once you are done. Regards. Smiley Happy
One Star

Re: SCD implementation in hive/hbase using Talend

SCD in Hadoop can be implemented using Hive ELT components.  My implementation has three parts:
-Change Data Capture (Using tELTHiveMap, tELTHiveInput(s) and tELTHiveOut).  The SCD table and staging table that contains today's records need to be left joined on the keys and if record exists compare the columns and write the appropriate code (Insert, Update/Insert). I wrote the results to a Delta Table.
- Now Apply the Inserts (New rows).
- Apply the type II updates (or SCD).  Insert Overwrite the rows that needs to be obsoleted. Add the new version needs to be updated.
One Star

Re: SCD implementation in hive/hbase using Talend

Code is attached
Hive_ELT_%281%29.zip.zip
One Star

Re: SCD implementation in hive/hbase using Talend

Hi sdhurjati,

Thanks for your response. i am trying to download the code .,but it is saying page not found. can you please re-upload the zip file again. i would be grateful to you. i am trying the process you explained now. i will wait for the zip upload. thanks a lot. Smiley Happy
One Star

Re: SCD implementation in hive/hbase using Talend

Hi,
i have one more query. are you using enterprise edition or open studio for BD?. please let me know. thank you
One Star

Re: SCD implementation in hive/hbase using Talend

Code is attached
Hive_ELT_%281%29.zip.zip

same error -> for this URL: (1).zip.zip
File / page not found
please upload it again
thanks
One Star

Re: SCD implementation in hive/hbase using Talend

Can the person who posted the image post images of each component and some workflow logic?
One Star

Re: SCD implementation in hive/hbase using Talend

Hi can someone please update the documentation required. the file attached not working. Please
One Star

Re: SCD implementation in hive/hbase using Talend

Can you please share he code.It would be of great help.Thanks in advance.
One Star

Re: SCD implementation in hive/hbase using Talend

Hi,
Reinitiating the discussion and need inputs please.
How is the Talend generated code performed on Hadoop cluster (~200 nodes) processing 100 M rows with approx 10 columns. i googled and and couldn't find much details on Talend 6 or earlier version performance metrics. I am aware that Talend is just a code generator but, does anyone has any performance metrics of SCD Type 1 (On hive) and SCD Type2 (on Hbase) processing? 
Please share if anyone has any details or some references.
Thanks,
Ugandhar
One Star

Re: SCD implementation in hive/hbase using Talend

Hi,I am trying to impliment the SCD type 2 through Hive,could someone pls guide me to understand these descriptions from the previopus post.
"SCD in Hadoop can be implemented using Hive ELT components.  My implementation has three parts:
-Change Data Capture (Using tELTHiveMap, tELTHiveInput(s) and tELTHiveOut).  The SCD table and staging table that contains today's records need to be left joined on the keys and if record exists compare the columns and write the appropriate code (Insert, Update/Insert). I wrote the results to a Delta Table.
- Now Apply the Inserts (New rows).
- Apply the type II updates (or SCD).  Insert Overwrite the rows that needs to be obsoleted. Add the new version needs to be updated.
"