How to filter in a tMap using values from two input sources

One Star

How to filter in a tMap using values from two input sources

I know that the usual way to do this is using an inner join, but then it is impossible to add any expressions to the right-side (or so it seems) of the join.

What I want to do is compare names from two sources and make it case-insensitive by bringing both strings down to lowercase before the comparison.

What I have tried to do is just add two input sources and then filter on the output, but somehow I end up with enormous amounts of data per iteration, I guess this is because the system is doing some kind of cartesian product instead of just giving me the two input streams in parallell.

Please see the two screenshots for an example of what happens when I apply this metod in the tMap.
One Star

Re: How to filter in a tMap using values from two input sources

Job screenshot - you can see the first row of input data has generated over 2000 of rows that are not matching the filter in the tMap.
One Star

Re: How to filter in a tMap using values from two input sources

tMap screenshot:

Re: How to filter in a tMap using values from two input sources

with no join defined, you will get a cartesian product. you can lowercase your input columns prior to bringing them into the tmap with a tReplace or tJavaRow-- then just define the join column normally.
One Star

Re: How to filter in a tMap using values from two input sources

Thank you, breaking up the process using tReplace / tJavaRow seems to be the right way to go about this.

It is easy to forget about all the other little components when tMap is so useful. Smiley Happy
One Star

Re: How to filter in a tMap using values from two input sources

One more problem remains - if I want to filter the incoming Main data flow to use in a comparison, it will then be lowercase when I insert it into the database after the comparison. How can I avoid this? Can I get back the unfiltered data after that tMap where the inner join happens?

Re: How to filter in a tMap using values from two input sources

you could populate your row output with new columns specifically for the join, then drop them in the tMap later:

i.e.
input-->tMap_1-->tMap2-->out

in tMap_1 you would duplicate your join columns, lowercasing the duplicates in your output table

in tMap_2 you will do the join on the duplicated columns and only pass through the data you want to insert.
One Star

Re: How to filter in a tMap using values from two input sources

Ok, that sounds sensible, you would keep it at two columns all the time then? I guess that is key when you work with an inner join, to not have the dataset grow out of proportion.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

APIs for Dummies

View this on-demand webinar about APIs....

Watch Now

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog

Why Companies Move to the Cloud: 7 Success Stories

Learn how and why companies are moving to the Cloud

Read Now