We are migrating data from Netezza data base to Hive Data base. We need to Compare all the tables in Netezza after migration with Hive Database Tables as we should not miss any data. By Using Talend Open Studio For Big Data, can we able to create a job to compare Two Tables within different databases. Can you please let me know the creation of job with steps.Thanks in advance.
A quick and easy way to spot differences between tables is to create a hash of all of the columns in each row in your tables and compare the hashes. If you have common unique keys to your data, output each row as essentially 2 columns; the key field and a concatenated hash of the rest. This will allow you to quickly find rows with differences and with them you can go into more detail to identify which columns are different.
Talend is a tool that essentially produces Java for you. Therefore it is easy to introduce your own (or other people's) Java classes and methods. Take a look here for Hashing (http://www.codejava.net/coding/how-to-calculate-md5-and-sha-hash-values-in-java). There are other sources online for this.
In order to convert the rows, simply read them in as normal, concatenate the columns in a tMap (or tJavaFlex, tJavaRow, etc) and use your Hashing technique on them there. After that it is simply a case of comparing Hash Strings.