I'm writing here because I need some advice.
I'm working on SQL jobs. My goal is simply to link multiple tMaps and tables and at the end only 1 tMap is writing on a database.
But the problem here is that I think it's too long.
For example :
As you can see, to write the result in the red line table, I'm already waiting for 1h30min which is very long.
And I can't find many documentation on this subject to improve this job. Is it possible for example to wait the process on the tMap to finish to send results to another or something like this ? Any advice would be appreciated for improving this process.
Thanks for reading
Are you lookup tables and your main source table in the same database? If so, just write your joins (lookups) into your source and use the one dbinput component. There is no point bringing all that data into Talend if you do not need to.
Thanks for answering,
Yes, my main source table and lookup tables are in the same database. The thing is that I need to link those tables together to have more information like ids, titles and stuff and create a new table at the end which is used by another person. This other person do some statistics reports. So it's like i'm filtering datas from those table to create another with only the information the other person needs. I hope i'm clear with this.
If those tables are all in the same database, you can do all that in a SQL query. It will be much quicker to join in the database (where I am assuming your keys, etc) and then use that query to bring the complete result set into Talend and then send that to your output table. I'm assuming you can write SQL here.
Ah OK, now this gets a little harder...but still doable. You need to understand the dynamics of your lookups. For example, if you have 100 inbound rows in your main source and 10000 lookup rows, it makes sense to carry out a lookup for every main row. So every row that comes in, you fire a lookup query. To do this you need to use the "Reload at each row" option in your tMap. This will send 100 queries to the lookup table, but you can provide a globalMap variable to use as the where clause in your lookup query so that your query will only return useful rows.
The best way to deal with problems like this is to "divide and conquer". By reducing the amount of data to match, you can massively improve performance. Obviously sending queries to the lookups for every row will bring about a bit of latency in firing the queries, so you need to experiment to see what is best.
I have demonstrated how to get your column data from the main query into the globalMap to use in a where clause with this tutorial (https://www.rilhia.com/quicktips/quick-tip-row-multiplication). I'm not actually sending the globalMap to another lookup db query (I am using a tJavaFlex for something different) but the theory is exactly the same. You need to follow this to set up the tMap to provide the globalMap variable and then use that variable in your db query's where clause.
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.