Optimize Tmap

Highlighted
Seven Stars

Optimize Tmap

Hello,

Looking for suggestions on my bottleneck within my transformation steps.  I attached an image of my design. 

The arrow points to the bottleneck input that has about 100mill records coming out of the SQL query. 

I'm selecting Load Once against this SQL query.

The input text file data has only 168k rows coming in.

I have my JVM arguments set at -Xms8G and -Xmx16G.

 

etl_bottleneck.png

 

Suggestions on how to avoid the GC memory overhead error and optimize this job?

 

Please and thank you!!

Highlighted
Sixteen Stars
Sixteen Stars

Re: Optimize Tmap

Did you try to change the paradigme of the tMap lookup model?

If your big table has a primary key (or and indexed column) and you got it in the text file data, you may try to change to "Reload at each row (cache)" model. After having changed the model, click the green plus sign on the lookup table, set the "globalMap Key" with a name of your choice then drag and drop the corresponding field from the main table (text file date).

It should look like this:

tMSpos.png

If you still have memory issues or the job take too long to run because of the number of queries it will generate, you may try to generate a dynamic IN clause with unique values issued from your text file data (something like "1,12,45,32,60,..."). The values will be stored into a global variable (use a tJavaFlex component to construct this variable).

Then on a separate subjob you will be able to query the big table to get only the required records with a SELECT statement looking like this:

"SELECT blablabla 
FROM yourBigTable WHERE yourKeyField IN (" + ((String)globalMap.get("yourVariable")) + ")"

TRF
Highlighted
Seven Stars

Re: Optimize Tmap

Thank you TRF for your response.

I did attempt some mods to the paradigm of the tmap, but have not tried the cache option yet.

Let me take your suggestions and see if I can implement them.  Will advise on my progress.

Thanks again!

Michelle  Woman Happy

Highlighted
Seven Stars

Re: Optimize Tmap

Hi TRF,

I'm running my process with a modification of the tmap doing a Reload at each row (cache), however it's sitting there processing and I feel like it's going to be the same processing time without this change.

Therefore, I would like to attempt your 2nd suggestion of generating a dynamic IN clause, which I envision working, but I have a few questions please:

1.  Where do I generate the 'IN clause' from my text file?  

2.  And the tjavaflex component will store these 'IN clause' unique variables, correct?

3.  Then I would create a subjub using tas400input to query the Select statement from the tjavaflex into the tmap I already have created?

 

Thank you very much for your help!

Michelle

 

 

 

Highlighted
Sixteen Stars
Sixteen Stars

Re: Optimize Tmap

Hi Michelle,

 

Having an "IN" clause could be a solution if you are sure the query length limit for your RDBMS will not be reached.

Place a tUniqRow after your data source component then a tJavaFlex in which you will:

  • initialize a global variable (let's say InClause) with the starting "(" in the start part
  • add current value (maybe enclosed by '') and a comma to InClause in the main part
  • remove the extra comma from InClause and add ")" in the final part

On the next subjob you will be able to reuse this variable in your WHERE clause:

"SELECT blablabla
WHERE blablabla
AND yourColumn IN " + (String)globalMap.get("InClause")

TRF
Highlighted
Seven Stars

Re: Optimize Tmap

Thanks TRF very much.

Let me give this a whirl and will advise.

Michelle

Highlighted
Seven Stars

Re: Optimize Tmap

Hi again,

After attempting, I realized I haven't done many of these things yet in Talend, would you be so kind to to help clarify a few things?

2020-02-27_0842.png

 

 

 

Sixteen Stars
Sixteen Stars

Re: Optimize Tmap

Feel free to reuse this example:

tjavaflex.PNG


TRF
Highlighted
Seven Stars

Re: Optimize Tmap

Oh thank you very much for an example!   Woman Surprised

Will advise on solution.

Michelle

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog