Performance improvement - lookup with tmap

Seven Stars

Performance improvement - lookup with tmap

Dear All,

 

I am using tmap to lookup two dfferent databases and using exp i am narrowing the lookup values however i am getting a performance of just 7-8 rows per sec.

I want to process around 1 Million records.

Attached is the design, is there anything more than can be done to improve performance.

Note: All indexes are in place.

 

Thanks

Vidya


Accepted Solutions
Community Manager

Re: Performance improvement - lookup with tmap

OK, your problem is the reload at each row. I suspect that your query being fired is looking through a lot of data and you are firing it a million times. That is guaranteed to be slow. From your diagram it looks like the main source of data and the lookup query are from the same database. If that is the case, do the lookup in the main query. There is absolutely no point joining in Talend if your data starts off in the same database. If it is not in the same database it might make sense to add the lookup data to your main data's database somehow.

 

You will not get round this with simple tweaks I'm afraid. 1 million queries is a lot of queries. You have to deal with the latency of building, sending and receiving the data for every single row in your main source.  


All Replies
Community Manager

Re: Performance improvement - lookup with tmap

It looks like your Main row is quite slow. Can you test this by removing the other components and testing with just a tLogRow. Also, can you show us your DB component configuration, both Basic and Advanced.

Eight Stars

Re: Performance improvement - lookup with tmap

There's quite a few things that can cause a job like this to be slow. You might try creating a test job with just the database connection and a tLogRow (no tMap) and see if it is significantly faster. If it isn't, then tMap isn't the issue.

 

If tMap is likely the issue, try rewriting your select query so you don't need to use an expression filter. You can include the context variable from globalmap() in a query statement; that way, the db's query engine is doing the work, rather than tMap (which is necessarily going to be slower, because it processes one row at a time, similar to a cursor).

 

Hope this helps.

Seven Stars

Re: Performance improvement - lookup with tmap

Hi Rhall,

 

Attached is the job with just logrow and db connection and also basic and Advanced settings.

 

Thanks

 

Seven Stars

Re: Performance improvement - lookup with tmap

db connection with tlogrow is quite faster.
cannot use globalmap in db because it takes input from one db and use it another db to limit the rows for lookup..this is because the filter constraint for second db changes wrt each row from first db..
i need to use tmap in this case
Community Manager

Re: Performance improvement - lookup with tmap

Go back to your original job and switch on the "Use Cursor" tick box. I think you will see an improvement.

Seven Stars

Re: Performance improvement - lookup with tmap

Hi, ticking "Use Cursor" had no impact on performance, its still the same.

 

Do you recommend any Cursor Size, i tried from the range 100- 10000

Highlighted
Community Manager

Re: Performance improvement - lookup with tmap

How is your tMap configured? Can you show us a screenshot of this configuration please?

Seven Stars

Re: Performance improvement - lookup with tmap

here is the tmap config and db query

With cursor size of 100, the performance was slightly improved from 7 rws/s to 11 rws/s.

Can it be imporved more ?

Community Manager

Re: Performance improvement - lookup with tmap

OK, your problem is the reload at each row. I suspect that your query being fired is looking through a lot of data and you are firing it a million times. That is guaranteed to be slow. From your diagram it looks like the main source of data and the lookup query are from the same database. If that is the case, do the lookup in the main query. There is absolutely no point joining in Talend if your data starts off in the same database. If it is not in the same database it might make sense to add the lookup data to your main data's database somehow.

 

You will not get round this with simple tweaks I'm afraid. 1 million queries is a lot of queries. You have to deal with the latency of building, sending and receiving the data for every single row in your main source.  

Seven Stars

Re: Performance improvement - lookup with tmap

Thanks Rhall,
Yes indeed its same db tables in the job as of now, but the lookup table is going to change in future, the table is gonna be from diff db. Will see the workaround..

Cloud Free Trial

Try Talend Cloud free for 30 days.

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.