I am able to get the below to run. However, it just seems inefficient to have to build out separate joins using the same files and then run the last query against that data. Is there a better way to optimize this design?
Solved! Go to Solution.
This isn't really enough information I'm afraid. It does look like there is probably a more efficient method for doing this (loading your file data into tHash components would be a start.....since it will be quicker). Can you tell us a bit more about the purpose of the job, the data involved and maybe show us some of the data that is being joined. Just a brief example demonstrating why you have done it the way shown in the image. Once we have that, it will be so much easier to understand the problem
I am doing an update to the B.tab file based on two parameters:
Assembly - based on a join to match Assembly and Assembly Site to B. tab
Component - based on a join to match Component and Component Site to B. tab
Based on this (in the B.tab file), I will update the Assembly Name and/or Component Name. If they match to either Assembly or Component.
I am performing an update function so I need to maintain the total records and only update fields that join to Assembly and/or Component. It would update that one row.
First of all, it sounds like you may need to read from some of the files more than once. As such, you should read from each of the files and load into a tHash component. Then carry out further reads from there. The next thing you should think about using are left outer joins in the tMap. Using left outer joins and thash components, you can do all of what you are doing with possibly only 1 tmap. With the left outer join functionality and thash components you can join in both your assembly check and your component check at the same time. You can use the tMap variables to identify whether you have an assembly match, a component match or if they both match, and then you can have multiple outputs from the tMap to handle the logic on how you deal with those scenarios. I believe you can probably do this with 2 subjobs; 1 to load your data into the tHash components and 1 to carry out the logic in 1 tMap.
i'm not sure I understand. i thought you could not perform two different joins on the same table in the tmap. can you provide a picture of what you mean?
You are just joining 1 source of data to your main flow each time. There is no reason you cannot join 2 to 100 sources of data to your main flow (apart from it getting messy). A left outer join works the same as in SQL. Take a look here to see how they work in Talend (https://www.talendbyexample.com/talend-tmap-component-joins.html).