Seven Stars

Extract Difference Data from two files

Hi All,

 

Is it possible to extract the difference data from tFileCompare ?

 

One approach to compare is LookUp (but i have different set of files to compare and each set has different headers, so for lookup i need to modify the file schema everytime), is there any component to compare two files and get the difference data as output directly ?

 

Thanks

  • Data Integration
1 ACCEPTED SOLUTION

Accepted Solutions
Eleven Stars

Re: Extract Difference Data from two files

You can concatenate and hash in your Oracle query (which would help make it more dynamic). This way your job does not have to be aware of the schema, you just need to pass it an appropriate SQL query with the concatenation and hashing built in.
Rilhia Solutions
14 REPLIES
Eleven Stars

Re: Extract Difference Data from two files

Are you looking for a comparison character by character? So, for example, in the following examples, the differences would be as shown below....

 

aaaaaaaaaaaaaaaabbbbbbbbbbbbbbcccccccccccccccdddddddddddd
eeeeeeeeeeeeeeffffffffffffffffffffggggggggggggggggghhhhhhhhhhhhiiiiiiii
1aaaaaaaaaaaaaaaabbb2bbbbbbbbbbbcccccccccccccccdddddddddddd
eeeeeeeeeeeeeeffffffffffffffffffffggggggggg3gggggggghhhhhhhhhhhhiiiiiiii

If that is what you want, there is nothing "out of the box" and it might be quite tricky to build this using standard components. You could try the Talend Exchange or look for a Java API to handle this and call it from Talend.

Rilhia Solutions
Seven Stars

Re: Extract Difference Data from two files

No, i want to do row by row comparison with columns values check
E.g
File 1
A|B|C
1|1|1

File 2
A|B|C
1|1|1
2|2|2

So if i compare these files, the oputput file should be
A|B|C
2|2|2

Moderator

Re: Extract Difference Data from two files

Hello,

Where are your input files from? Tables? Are you looking for redundancy analysis in talend data quality prodcut?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Seven Stars

Re: Extract Difference Data from two files

Yes, the inputs are from tables with same scehma but from two different databases (Oracle 9 and Oracle 12 respectively).
however i cannot connect both the databases in talend due to some concerns, but i can extract data files individually.
i need these data files to be compared so as to validate if all the data for few selected tables has migrated correctly from oracle 9 to Oracle 12
Seven Stars

Re: Extract Difference Data from two files

@xdshi,
Can you please tell me if below design is possible in Talend.
1) i have two database connections available in application (METADATA)
2) Both the databases have same tables with similar schémas (There are 30 or so tables )
3) I want to compare the data from the two corresponding tables
4) Can i design a job where i can send the name of the tables one after another and the job does data comparison and creates an output file with difference data rows
5) In this way i can avoid the need to create a new job for new table comaprison (for 30 tables i hve to create 30 jobs)
Eleven Stars

Re: Extract Difference Data from two files

If the tables that are supplying the data have the same schema, you don't need to worry about the headers at all. Just join your two files using a tMap and ensure that every column that should be the same is joined. Then have two outputs; 1 for the matches and one for rows from the main that do not match

Rilhia Solutions
Seven Stars

Re: Extract Difference Data from two files

Thanks Rhall,
but now i have two tables directly and not file data, i know that we can join two oracle tables also and perform the operation but my job has to handle 30 set of tables.
is it possible to design one job that can be reused by every table set ?
The DB's are like these below
DB1 DB2
Table1-Schema1 Table1-Schema1
Table2-Schema2 Table2-Schema2
Table3-Schema3 Table3-Schema3
So i have to campare Table1 from DB1 and DB2, create an output file with uncommon data
after table1 the process should continue for Table2 and respectively
Eleven Stars

Re: Extract Difference Data from two files

Yes, you can create one job for all tables.....but it will only tell you about rows that are exactly the same....and it will be complicated to build if you are new to this.

 

1) Input your data from your tables with ALL of the columns concatenated and hashed. Output this as a String (Varchar). You will need a primary key on the table to be output as well. So your data from each table will be ....

 

Key
ConcatenatedHash

 

2) In your tMap join on your ConcatenatedHash column. Remember that the Main flow will be the only flow where ALL rows are guaranteed to be tested. If you require both sides to be tested you will have to reverse the lookup in another tMap.

 

3) When you identify matches, you can link back to your unconcatenated data using the Key.

 

Rilhia Solutions
Seven Stars

Re: Extract Difference Data from two files

Yeah thats a bit of complex process, as per the requirement i was hpoing to implement the below oracle statement in Talend

select * from DB1-table1
MINUS
select * from DB2-table1

its okay if this is not possible via talend
Thanks
Eleven Stars

Re: Extract Difference Data from two files

That is not possible. When you do that in Oracle it is fully aware of the schemas (although they have to be same....which you were saying is not necessarily the case here). You can do that with Talend in your database component (if both tables are in the same database), but this is not the case....or is it? If it is the case, just replace the table names with context variables. However if it is not the case, the solution I gave you is not complicated at all. I believe you can make it completely dynamic if you have enough permissions on your database user.

Rilhia Solutions
Seven Stars

Re: Extract Difference Data from two files

No, the tables are in different databases.
Will try to implement the solution suggested by you.
Thanks
Seven Stars

Re: Extract Difference Data from two files

@rhall_2_0,
1) Input your data from your tables with ALL of the columns concatenated and hashed. Output this as a String (Varchar). You will need a primary key on the table to be output as well. So your data from each table will be ....
Key
ConcatenatedHash
-- Is there any component which does the concatenation and hashing of the columns ? or how should i do it using talend

Eleven Stars

Re: Extract Difference Data from two files

You can concatenate and hash in your Oracle query (which would help make it more dynamic). This way your job does not have to be aware of the schema, you just need to pass it an appropriate SQL query with the concatenation and hashing built in.
Rilhia Solutions
Seven Stars

Re: Extract Difference Data from two files

Thanks Rhall,
This worked perfectly fine for me,
the design to concatenate was like
-tOracleInput component to find columns for the respective tables - > tDenoramalize component to combine all the columns with separator ||'|'|| and use the same in the select queries