Can someone please advise the best job flow. I want to compare the result of 2 queries from a different databases (one is SQL server the other is Oracle), there is 20million rows returned from each query.
I am thinking of something like this as it would merge the results of both queries and remove the duplicates but I don't know what components in Talend to use to achieve this (if there is better performance from using a CRC row - pls incorporate that in the job flow): select cust_id, cust_name, address from sqlserv_table UNION select c_id, customer_name, cust_address from oracle_table Note: there is no primary_key on the tables so (i guess all columns together/or the CRC could be used to form the PK) Finally I would like to be able to output which rows are present in the sqlserv_table but missing from the oracle_table and vice_versa (to 2 separate files).
Hi, You can use the tMap component. It allows you to make a join between 2 tables according to differents columns, this way you are able to catch which lines are not in the other table. . In my example sqlserv_table is the entry, oracle_table is the lookup. I'm catching the rows which are in the sqlserv_table but NOT in the oracle_table. Just change the table if you want to the inverse (find the line which are in oracle and not in sqlserve). Tell me if you got through some problems.