Looking for suggestions on my bottleneck within my transformation steps. I attached an image of my design.
The arrow points to the bottleneck input that has about 100mill records coming out of the SQL query.
I'm selecting Load Once against this SQL query.
The input text file data has only 168k rows coming in.
I have my JVM arguments set at -Xms8G and -Xmx16G.
Suggestions on how to avoid the GC memory overhead error and optimize this job?
Please and thank you!!
Did you try to change the paradigme of the tMap lookup model?
If your big table has a primary key (or and indexed column) and you got it in the text file data, you may try to change to "Reload at each row (cache)" model. After having changed the model, click the green plus sign on the lookup table, set the "globalMap Key" with a name of your choice then drag and drop the corresponding field from the main table (text file date).
It should look like this:
If you still have memory issues or the job take too long to run because of the number of queries it will generate, you may try to generate a dynamic IN clause with unique values issued from your text file data (something like "1,12,45,32,60,..."). The values will be stored into a global variable (use a tJavaFlex component to construct this variable).
Then on a separate subjob you will be able to query the big table to get only the required records with a SELECT statement looking like this:
FROM yourBigTable WHERE yourKeyField IN (" + ((String)globalMap.get("yourVariable")) + ")"
Thank you TRF for your response.
I did attempt some mods to the paradigm of the tmap, but have not tried the cache option yet.
Let me take your suggestions and see if I can implement them. Will advise on my progress.
I'm running my process with a modification of the tmap doing a Reload at each row (cache), however it's sitting there processing and I feel like it's going to be the same processing time without this change.
Therefore, I would like to attempt your 2nd suggestion of generating a dynamic IN clause, which I envision working, but I have a few questions please:
1. Where do I generate the 'IN clause' from my text file?
2. And the tjavaflex component will store these 'IN clause' unique variables, correct?
3. Then I would create a subjub using tas400input to query the Select statement from the tjavaflex into the tmap I already have created?
Thank you very much for your help!
Having an "IN" clause could be a solution if you are sure the query length limit for your RDBMS will not be reached.
Place a tUniqRow after your data source component then a tJavaFlex in which you will:
On the next subjob you will be able to reuse this variable in your WHERE clause:
"SELECT blablabla WHERE blablabla AND yourColumn IN " + (String)globalMap.get("InClause")
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Part 2 of a series on Context Variables
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema