Five Stars Icy
Five Stars

Improve performance when do a lookup

Hi team,

When i run a subjob, the main table has more than 200+ thousands records. I used "Lookup Model:Reload at each row" to filter Lookup table in tMap. Also added conditions in main table. However, the whole of job still will take about 20min.

 

Capture.PNG

I did some filers through globalmap in lookup table.

Capture.PNG

 

In tMap, implement "Left Out Join" and "Reload at each row" lookup model.

Capture.PNG

 

Complete to run the whole of job will take so long time(20min).

Do you have any idea to do the performance tunning?

BTW are there any good ways to use connection pool in Talend?

Thank you in advance!

  • Data Integration
1 ACCEPTED SOLUTION

Accepted Solutions
Five Stars Icy
Five Stars

Re: Improve performance when do a lookup

Now we resolve the performance issue is to add more filters in lookup table which is simplest. After i modified it that speed up 50%. It's more better than before.

Many thanks to your help Smiley Very Happy

16 REPLIES
Eleven Stars

Re: Improve performance when do a lookup

First of all. are you getting the data from the same database? If so, you should consider joining and filtering in the DB. There is little point bringing in more data than you need to only throw it away.

Rilhia Solutions
Eleven Stars TRF
Eleven Stars

Re: Improve performance when do a lookup

Hi,
Did you measure response time with these queries from outside of Talend?
Also (will not solve the problem) you should remove all fields from the select part for which you know the value (they are in the where clause). You'll have less data to transfer from the db server.

TRF
Five Stars Icy
Five Stars

Re: Improve performance when do a lookup

The data is from different database.

I think i have already added enough conditions to filter in the DB, at the same time, did some deals in the tMap, so i have no idea how to improve the performance through other ways!

Five Stars Icy
Five Stars

Re: Improve performance when do a lookup

The data is from different database.
I think i have already added enough conditions to filter in the DB, at the same time, did some deals in the tMap, so i have no idea how to improve the performance through other ways!
Eleven Stars

Re: Improve performance when do a lookup

OK, it looks like you are doing the sort of things I would try to speed things up. Another thing to test is where the bottleneck is. Can you disconnect your DB write/update components and just run the code where the join is happening. Does that massively speed things up? If so (as I have seen before) this could very well be a insert/update issue. Also, are you inserting or updating.....or both?

Rilhia Solutions
Five Stars Icy
Five Stars

Re: Improve performance when do a lookup

I just tested the joining part and disconnected the update/insert. I think the bottleneck is from the joining part.

Capture.PNG

Eleven Stars

Re: Improve performance when do a lookup

OK. Somewhere where you might get a slight improvement is by removing the joins on the columns you are filtering your lookup on. They are not needed. You will only be returning rows that hold the same values as the row that is in coming in in your main flow of data. Therefore you can remove those from the query and from the join. This shouldn't resolve this issue, but will be more efficient.


However, before doing that are you sure that loading the lookup on every row is the most efficient way of doing this? Sometimes it is without question, but if your main flow is 200,000 rows, that is 200,000 queries fired to the DB. I am assuming you have probably looked into this, but it is worth considering if you haven't....and easier to test before removing the joins I advised you to remove above.

 

 

Rilhia Solutions
Five Stars Icy
Five Stars

Re: Improve performance when do a lookup

I tried to remove the joins, however looks like there is no more efficient.

Capture.PNGCapture.PNG

And i think i have to load the lookup on every row because the lookup table has 428,634,831 rows. Or else it should be more slower.

Eleven Stars

Re: Improve performance when do a lookup

OK, do you  know how many rows are returned for every row that the lookup is fired for? In theory you should only return the rows that you need if you are going to fire 200,000 queries. If you are returning more rows and having to filter them in Java, maybe you can tailor your filtering in your query?

Another solution to this will require some playing around, but may just work. I suspect that you can filter preemptively find some values to filter your lookup query on, before you carry out your lookup once. To do this you will need to change the job to dump your main flow into a tHash component (store it in memory). You have some columns that you need to join on or filter by. Now, if you load the main flow preemptively you *may* be able to find some data to use to filter the lookup by. For example, lets say that your lookup data has a column called "alphabet" which holds every letter from a to z. However, your main flow only returns "a", "g" and "y". If you know this before running the lookup query, you can add an "IN" filter to the query and pass in your comma separated list of "a", "g" and "y" to the SQL query. This might remove 3/4 of yoru lookup data. If you are only loading this once and into memory, you will lose the latency of firing the query for every row and the number of rows to be checked each time is also massively reduced. 

 

Another thing I have just noticed, do you need to match ALL ROWS on the match model? If you don't then switch this off. It will mean that all data will need to be checked for every incoming row.

Rilhia Solutions
Five Stars Icy
Five Stars

Re: Improve performance when do a lookup

Just only one row is returned for every row that the lookup is fired for.

You are right the best practice is to find some values to filter my lookup query and carry out load once which is more efficient.

However, if I can not find some values to filter my lookup query. Can I use "Parallelization" to execute the job? Then use the connection pool to control the tMSSqlConnection. How do you think? Or do you have any idea?

Capture.PNG

Eleven Stars

Re: Improve performance when do a lookup

Using parallelization will help in this scenario, but it could also slow things down. The rule of thumb for the number of threads is the number of cores - 1. If you start with this and tweak (up and down) you will find an optimum. This also depends on how much other work your machine is doing.

Connection pooling will not help you here. The connection component will handle connections for every db component in your job....if they are hooked up to the connection component. Connection pooling would only really be of use in the situation where the job is running multiple times in parallel.  

Rilhia Solutions
Five Stars Icy
Five Stars

Re: Improve performance when do a lookup

Yeah, we'd like to run the job in multiple times in parallel.  

However, seems connection pooling can not be used in TBD 6.2.1. Right?

Eleven Stars

Re: Improve performance when do a lookup

I don't understand how connection pooling would help you. The Connection component maintains the connections for the job. If you have jobs running in parallel (ie multiple instance of a job NOT a section of the job as demonstrated in your screenshots), the Connection component in each instance will handle this. Connection pooling is only really a requirement for a webservice where thousands of instances of a service can be using connections concurrently. This is handled quite easily using Spring and has been supported for  quite a while now.

Rilhia Solutions
Five Stars Icy
Five Stars

Re: Improve performance when do a lookup

Now we resolve the performance issue is to add more filters in lookup table which is simplest. After i modified it that speed up 50%. It's more better than before.

Many thanks to your help Smiley Very Happy

Moderator

Re: Improve performance when do a lookup

Hello Icy,

Thanks for your feedback and posting your solution here.

Best regards

Sabrina

 

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Three Stars

Re: Improve performance when do a lookup

Hi 

In your lookup the database server will need to calculate he plan execution each time the query is fired, because of Load at each row property in tMap.

 

The best way for lookup like this is to use a parametrized query, so the database server will calculate the plan execution for the query One time, and not for every query fire. So, to do this you'll need to replace the LKPFulfilPackageCharge Lookup table with tOracleRow and a tParseRecordSet.

 

In the tOracleRow don't concatenate filters in the query, but use parameterized query like "select columnA, columnB from Table01 where columnC = Smiley TongueColC".

 

That will a lot improve your Job performance.

Best regards.