Six Stars

How to compare two data sources, column by column.

Hi 

 

I'm currently comparing two data sources using the tMap component.

 

One example is where I compare a .TXT file against a Sybase table by using the inner join in the tMap, to see the differences I set the tMap's output to Catch lookup inner join reject as "true" and this works quite well.

 

However since this is a join, as output I get a whole record(s) which doesn't appear in the main flow based on my keys and I'm more interested in pin pointing the actual column(s) that are incorrect. - I'm not sure if it's because of my tMap configurations, so attached are both the flow and tMap configurations.

 


Sybase table compared to .TXT fileSybase table compared to .TXT filetMap editor configurationstMap editor configurations

 

If my question isn't clear here please let me know so I can clarify.

 

Thanks

Tags (2)
1 ACCEPTED SOLUTION

Accepted Solutions
Twelve Stars

Re: How to compare two data sources, column by column.

This will not work. You ONLY want to compare records that should be the same. To do this you must join on a key column that is consistent between the datasets and will not change. Once you have your JOIN key sorted (use an inner join), then you can use tMap variables (the box between the inputs and outputs) to compare your columns. This way you can isolate precisely what has changed.

 

If you care going to be comparing many rows, it may be more efficient to compare the whole row (using a hashing algorithm on all values), then select rows where they do not match, then use the comparison technique I described above. This will be quite a processor expensive process unless you use some heuristics to cut down unnecessary comparisons. 

Rilhia Solutions
9 REPLIES
Twelve Stars

Re: How to compare two data sources, column by column.

This will not work. You ONLY want to compare records that should be the same. To do this you must join on a key column that is consistent between the datasets and will not change. Once you have your JOIN key sorted (use an inner join), then you can use tMap variables (the box between the inputs and outputs) to compare your columns. This way you can isolate precisely what has changed.

 

If you care going to be comparing many rows, it may be more efficient to compare the whole row (using a hashing algorithm on all values), then select rows where they do not match, then use the comparison technique I described above. This will be quite a processor expensive process unless you use some heuristics to cut down unnecessary comparisons. 

Rilhia Solutions
Six Stars

Re: How to compare two data sources, column by column.


rhall_2_0 wrote:

This will not work. You ONLY want to compare records that should be the same. To do this you must join on a key column that is consistent between the datasets and will not change. Once you have your JOIN key sorted (use an inner join), then you can use tMap variables (the box between the inputs and outputs) to compare your columns. This way you can isolate precisely what has changed.

 

If you care going to be comparing many rows, it may be more efficient to compare the whole row (using a hashing algorithm on all values), then select rows where they do not match, then use the comparison technique I described above. This will be quite a processor expensive process unless you use some heuristics to cut down unnecessary comparisons. 


Thanks @rhall_2_0

I haven't looked into the tMap variables section, I'll try reading some more on this, sounds like what I'm looking for.

Will advise if I win, thanks again.

Six Stars

Re: How to compare two data sources, column by column.

you have a possibility to compare a field with its value. For this you must retrieve the tunpivot component in exchange.
So all the field names will be in a single column (pivot key), and the value in another (pivot value)
tell me if that's what you're looking for

Six Stars

Re: How to compare two data sources, column by column.


fab_74 wrote:

you have a possibility to compare a field with its value. For this you must retrieve the tunpivot component in exchange.
So all the field names will be in a single column (pivot key), and the value in another (pivot value)
tell me if that's what you're looking for


@fab_74

I'm not sure I follow how comparing the field with its value would help, please advise?

Six Stars

Re: How to compare two data sources, column by column.

I do not know if this answers your problem.
Attention with memory problems if many records

Six Stars

Re: How to compare two data sources, column by column.


fab_74 wrote:

I do not know if this answers your problem.
Attention with memory problems if many records


@fab_74

Thanks, however I was looking for something that'll compare column 1 with column 1 and if they don't match return a value.

I see the solution you provided still returns the whole record instead of just column 1. See below example of the differences report I'm looking to generate. I'm currently working on this using @rhall_2_0's suggestion, making progress but not there yet.

Differences_report.JPG

 

Six Stars

Re: How to compare two data sources, column by column.


accuracie wrote:

rhall_2_0 wrote:

This will not work. You ONLY want to compare records that should be the same. To do this you must join on a key column that is consistent between the datasets and will not change. Once you have your JOIN key sorted (use an inner join), then you can use tMap variables (the box between the inputs and outputs) to compare your columns. This way you can isolate precisely what has changed.

 

If you care going to be comparing many rows, it may be more efficient to compare the whole row (using a hashing algorithm on all values), then select rows where they do not match, then use the comparison technique I described above. This will be quite a processor expensive process unless you use some heuristics to cut down unnecessary comparisons. 


Thanks @rhall_2_0

I haven't looked into the tMap variables section, I'll try reading some more on this, sounds like what I'm looking for.

Will advise if I win, thanks again.


@rhall_2_0

 

I looked at tMap variables and it solved my compare column by column problem. I created a variable with an if statement which looks like: !String.valueOf(row1.Column1).equals(String.valueOf(row2.Column1)) ? "Not equal" : "Equal"  

 

As much as this works it looks like it'll be a lot of work especially for tables with a lot of columns where I create this variable for each column. I know you've solved this problem but wouldn't mind a neater suggestion Man Happy

 

Thanks for your help.

Highlighted
Twelve Stars

Re: How to compare two data sources, column by column.

This will always be a lot of work as you will need to configure the comparing of each column, but you might be able to make it a little less tedious to configure. If you create a code routine (a Java method essentially) to carry out the compare logic, you can simply call the method and supply the variables to carry out the comparison. Then, in the future, if you need to slightly adjust the comparison logic, you can edit it in one place. 

Rilhia Solutions
Six Stars

Re: How to compare two data sources, column by column.


rhall_2_0 wrote:

This will always be a lot of work as you will need to configure the comparing of each column, but you might be able to make it a little less tedious to configure. If you create a code routine (a Java method essentially) to carry out the compare logic, you can simply call the method and supply the variables to carry out the comparison. Then, in the future, if you need to slightly adjust the comparison logic, you can edit it in one place. 


Makes perfect sense, will use this way...thanks a lot @rhall_2_0