tMap vs SQL Joins, what is faster

Hi,
If i have to process say 50 million rows in my ETL, which implementation would be better/recommended :
Using tMap for the transformations or writing a SQL.
In other words, are SQL Joins and filters always faster than talend components?
6 REPLIES
Moderator

Re: tMap vs SQL Joins, what is faster

Hi,
For your description, I think the performance depends on your work flow, your data and the transformations.Could you give us more information about your ETL so that we can design a proper job for you.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

Re: tMap vs SQL Joins, what is faster

Hi,
For your description, I think the performance depends on your work flow, your data and the transformations.Could you give us more information about your ETL so that we can design a proper job for you.
Best regards
Sabrina

Hi Sabrina,
Consider a use case where, to loading my fact tables i have to join them with 5 dimension tables (like geography, customer, product, etc.) and pull their ID's into my fact table. Now since the data is huge (~ 50 million records per day) so doing it from tMap (that too storing the data on disk to avoid Java Heap space issues) will be slower than writing an equivalent SQL for this use case and triggering it from tMSSqlRow.
Let me know if my understanding is correct, or can the tMap perform faster than the SQL Join query for this specific use case.
Thanks in advance.
Moderator

Re: tMap vs SQL Joins, what is faster

Hi,
Consider a use case where, to loading my fact tables i have to join them with 5 dimension tables (like geography, customer, product, etc.) and pull their ID's into my fact table. Now since the data is huge (~ 50 million records per day) so doing it from tMap (that too storing the data on disk to avoid Java Heap space issues) will be slower than writing an equivalent SQL for this use case and triggering it from tMSSqlRow.

You are right that tMap will be slower than writing a equivalent SQL for this use case. Because Sql query is executed on the database engine directly.
In addition that, we provide ELT component and it recommend that you can use ELT components to achieve your goal.
For more details, please refer to: What is ELT and ELT components
Btw, you can use your forum account to log in Talend Help Center.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Seventeen Stars

Re: tMap vs SQL Joins, what is faster

The memory consumption in this case only depends on the size of the dimension tables, not of the size of your fact table.
If your experiences memory leaks here - are your see them before the job starting the select of the main flow data or while the reading of the dimensions?
You can check this by selecting your fact data an deactivate all dimension inputs. If your job continue losing memory the problem is the tMSSqlInput component.
Geography sound not so much, customer - ok that can be a huge table and products should also be moderate.
Normally joining in the tMap is the fastest way with a lot of enhancements like getting datasets which does not match the join rule (data quality check).

Re: tMap vs SQL Joins, what is faster

Hi,
Thanks to both of you for your inputs.
So, can we conclude that joining with tMap is faster as compared to writing an equivalent SQL provided that the dimension table sizes are small, and there are no memory leaks?
Moderator

Re: tMap vs SQL Joins, what is faster

Hi,
Generally speaking, you have the right thought, but one more question we don't have a clear limitation about the sizes of dimension table. How big is the dimension tables will cause a memory leak?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.