Doing an inner join using a tMap component

Highlighted
Four Stars

Doing an inner join using a tMap component

I'm new to talend and looking to join two tables. I used the tMap component and it worked fine, but then I looked into it. I have two tables, lets say table A and table B. When I drag them into the designer they're recognized as MySqlInputs which is what I want. However, when I click the actual table components, I can see the query that they're running. Both tables are just running a SELECT fields FROM table query. 

 

From what I've read online you would take these two components and route them into the tMap. However, this doesnt really make sense to me. Why am I selecting ALL of the rows from both tables, then letting talend do the join? It would be significantly faster if I just told the actual database to do the join, and then play around with the results. Is there anyway to join two or more tables without having to SELECT every single record, then let talend handle the joining with a tMap component?

 


Accepted Solutions
Employee

Re: Doing an inner join using a tMap component

Hi,

 

     Your thoughts are absolutely correct and it is a common mistake done by many ETL developers. Any ETL tool including Talend is not a replacement for database table joins. So if you are pretty sure that your dataset is a join of two or more tables, feel free to write the SQL joining all the required tables and add the query inside tDBInput component. You can add one of the tables you are using for joining in the table name component of tDBInput. 

 

     The advantages of this methodology is that you are reducing the result set from source level itself. For example, you have a table with 1 million records and 30 columns. But you need only last day's records which might be approximately 10,000 records (after joining with other tables or by adding necessary filter conditions) and you are looking for only 5 columns out of 30. In this case, you should always apply the filter at the source to reduce the data traffic to later stages including to Talend.

 

      tMap is a component used to do joins when you do not have the opportunity to do it in previous level and it is also used to join data sets from multiple source systems (for example data flowing from a file and DB). I hope I have given answer to your query in detailed manner. Enjoy programming in Talend :-)

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)


All Replies
Six Stars

Re: Doing an inner join using a tMap component

Hi Meeoh,

You are absolutely correct you can perform a join at query level on top of a database outside the talend as well as with in talend using tMap. You should actually decide which one fits best for your requirement but low are some differentiators.

1) Prefer tMap if you source tables have higher volumes of data, as talend builds cache ". bin" files which will improver the performance. Only if you provide a temp space details in tMap configuration. For small volumes or reference data lookups use DB join by customising the query in the Mysqlinput component.

2)Using TMap join we can collect the inner join rejects from source which is not possible directly using DB join.

Thanks
Ashok
Employee

Re: Doing an inner join using a tMap component

Hi,

 

     Your thoughts are absolutely correct and it is a common mistake done by many ETL developers. Any ETL tool including Talend is not a replacement for database table joins. So if you are pretty sure that your dataset is a join of two or more tables, feel free to write the SQL joining all the required tables and add the query inside tDBInput component. You can add one of the tables you are using for joining in the table name component of tDBInput. 

 

     The advantages of this methodology is that you are reducing the result set from source level itself. For example, you have a table with 1 million records and 30 columns. But you need only last day's records which might be approximately 10,000 records (after joining with other tables or by adding necessary filter conditions) and you are looking for only 5 columns out of 30. In this case, you should always apply the filter at the source to reduce the data traffic to later stages including to Talend.

 

      tMap is a component used to do joins when you do not have the opportunity to do it in previous level and it is also used to join data sets from multiple source systems (for example data flowing from a file and DB). I hope I have given answer to your query in detailed manner. Enjoy programming in Talend :-)

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Four Stars

Re: Doing an inner join using a tMap component

Thanks for the detailed response Nikhil, this is exactly what I was looking for!

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch