One Star

How to compare files using TMap

I'm following this post where a Talend user was able to compare 2 files using tMap, but I can't replicate his success. Any direction is much appreciated.
http://www.talendforge.org/forum/viewtopic.php?id=12358
File to Compare contents (Main)
col1, col2, col3, col4
1,2,3,9
5,6,7,8
Reference File contents (lookup)
col1, col2, col3, col4
1,2,3,4
5,6,7,8
In tMap, I dragged each reference column to the like-named column in the main
row1.col1 -- row2.col1
row1.col2 -- row2.col2
row1.col3 -- row2.col3
row1.col4 -- row2.col4
This plotted purple colored lines and "key" graphics as shown in the screen shot.
I then clicked the tMap Settings button of the row2 and set Join Model to "INNER JOIN"
(I did not set any columns to KEY in the schema editor for either row, because it seems like INNER JOIN should take care of that..)
I then added 2 tFileOutputDelimited components (match and diffs)
I dragged all columns of row1 (lookup) to MATCH and all columns of row2 (main) to DIFFS.
Using tMap Setting of the DIFFS output I set Catch lookup inner join reject to "TRUE".
My output in the match file has the data I expect:
5,6,7,8
But the DIFFS file only contains a separator:
,,,
Whereas I expected this row of data:
1,2,3,9
I've worked thru the tMap example in TalendOpenStudio_Components_RG_41b_EN.pdf and the tutorial here: http://www.talendforge.org/tutorials/tutorial.php?language=english&idTuto=8. Of course, both of these trials worked flawlessly, but I've not been able to extract the necessary portions to build my own file compare.
Thanks in advance for any direction/suggestions.
Mark
7 REPLIES
Employee

Re: How to compare files using TMap

Hi Mark,
Instead of using the data from row2 in your second output, use the data from row1. You should see it then :-)
One Star

Re: How to compare files using TMap

That works, however, I don't understand why. If the difference is coming from ROW2 (second file), why link the LOOKUP file (ROW1) to DIFFS?
Anyway, it works...
Thanks much.
Mark
Employee

Re: How to compare files using TMap

Because when you catch inner join rejects, they are rejects. What it means is that the data that came from row1 couldn't find a match in row2, so they are rejected.
In that case, how do you expect to get data from row2?
One Star

Re: How to compare files using TMap

Thanks for the detailed example on file comparison. I am new to Talend DI and want to do file comparison for my testing requirements. It was very much helpful.
In the example it will show the entire row in which there is a difference. But in my case as the file size is huge the requirement is to filter out the specific row and column which is not matching, not the entire row.
As per the example here,
File to Compare contents (Main)
col1, col2, col3, col4
1,2,3,9
5,6,7,8
Reference File contents (lookup)
col1, col2, col3, col4
1,2,3,4
5,6,7,8
The output will show the entire row
1,2,3,9
where in I want to see only 9 here. Is that possible. Pls guide.
Four Stars

Re: How to compare files using TMap

Hi Pulak,
In tMap if you have a join on Col4 only, then you can get the reject. Only condition is that you put single column C4 as reject output and not entire input columns.
What is your scenario, can you pl put the screenshot?
Vaibhav
One Star

Re: How to compare files using TMap

Hi Vaibhav,
Thanks for your answer. The requirement here is, I dont want see the entire row, rather I want to see the specific column where is the difference. In large files the data may vary in any of the columns.
So my question is, can tmap point the exact cell where is the difference rather than showing the entire row.
To give an example,
File1,
1234
5678
1111
4444
FIle2,
1234
5978
1221
3444
So can I get data in my output file only for the difference,
row1: okay
row2,col2: 9
row3,col2:2,row3,col3:2,
row4,col1:3
Because in large files you never know where is the difference. My file contains million of records. So if I will get the entire row they its difficult to match up.
Thanks,
Pulak
Four Stars

Re: How to compare files using TMap

Hi,
Please see the KB article about https://help.talend.com/search/all?query=Converting+columns+to+rows&content-lang=en
Once these key value pairs are generated, compare it to each other and get rejected in the output file.
I think you got the idea. hope it will be helpful for you.
Best regards
Vaibhav