Four Stars

Compare Two Tables field by field in different data bases(Netezza & Hive)

Hi ,

We are migrating data from Netezza data base to Hive Data base. We need to Compare all the tables in Netezza after migration with Hive Database Tables as we should not miss any data. By Using Talend Open Studio For Big Data, can we able to create a job to compare Two Tables within different databases. Can you please let me know the creation of job with steps.Thanks in advance.

 

Regards,

ntrayudu.

  • Data Integration
3 REPLIES
Ten Stars

Re: Compare Two Tables field by field in different data bases(Netezza & Hive)

A quick and easy way to spot differences between tables is to create a hash of all of the columns in each row in your tables and compare the hashes. If you have common unique keys to your data, output each row as essentially 2 columns; the key field and a concatenated hash of the rest. This will allow you to quickly find rows with differences and with them you can go into more detail to identify which columns are different.

Rilhia Solutions
Four Stars

Re: Compare Two Tables field by field in different data bases(Netezza & Hive)

I am not having any idea about to create hash concept, could you please guide me , how to create hash on each row of table and how to use the concept in Talend Tool
Note: We have data around 300 crores rows of data in some of the tables.
For these type of huge size tables what is the best way to do these type of validation.
Ten Stars

Re: Compare Two Tables field by field in different data bases(Netezza & Hive)

Talend is a tool that essentially produces Java for you. Therefore it is easy to introduce your own (or other people's) Java classes and methods. Take a look here for Hashing (http://www.codejava.net/coding/how-to-calculate-md5-and-sha-hash-values-in-java). There are other sources online for this.

In order to convert the rows, simply read them in as normal, concatenate the columns in a tMap (or tJavaFlex, tJavaRow, etc) and use your Hashing technique on them there. After that it is simply a case of comparing Hash Strings.

Rilhia Solutions