tFileInputXML -> tFlowToIterate -?> tJoin

One Star

tFileInputXML -> tFlowToIterate -?> tJoin

Hi,

I have a number of patient IDs in an XML file. Since it contains patient notes, there are duplicates when extracting all IDs. For each unique patient I add a new entry in a lookup table that maps the patient's ID to a new meaningless ID. There's probably a few approaches to doing this, and here's mine:

tFileInputXML -> tFlowToIterate -?> tJoin -> (got it from here)

Now I want to do a tJoin for each ID from the XML with the lookup table. Each rejected inner join means the patient ID is not yet in the lookup table so I'll add it downstream. There is probably something I am not yet grasping (I'm new to Talend) but I can't connect the tFlowToIterate to the tJoin.

Thanks for any hint,
Yves

Re: tFileInputXML -> tFlowToIterate -?> tJoin

You will need some way to extract the values from your flowToIterate and put them back in a flow. Take a look at the tFixedFlowInput component.

All of the column values from your xml file will be pushed into the global map by your tFlowToIterate-- take a look at the code tab to see exactly what they are.
One Star

Re: tFileInputXML -> tFlowToIterate -?> tJoin

Thanks, I'll give it a go.
One Star

Re: tFileInputXML -> tFlowToIterate -?> tJoin

Hi,

Your suggestion works but I'm not sure whether I really need to iterate since the result after tFileInputXML seems no different...

My purpose for using tJoin is to filter out duplicate patient IDs which already have an entry in the lookup table. Only for the inner join rejects I add a new entry that maps the ID to a new (meaningless) ID.

tMSSqlInput -[lookup]-> tJoin -[Inner join reject]-> tMSSQLOutput

Strangely, it's as if the lookup happens only once regardless the number of iterations.

Any hints?

Yves

Re: tFileInputXML -> tFlowToIterate -?> tJoin

Lookups are loaded once per subjob
One Star

Re: tFileInputXML -> tFlowToIterate -?> tJoin

Aah. Good. This probably saved me more than a couple of hours!
One Star

Re: tFileInputXML -> tFlowToIterate -?> tJoin

Would you agree it is better to remove duplicates using tUniq <i>before</i> doing the tJoins then? Having each tJoin done in a subjob for the sake of incorporating the latest additions to the lookup table would unnecessarily complicate things, no?

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Have you checked out Talend’s 2019 Summer release yet?

Find out about Talend's 2019 Summer release

Blog

Talend Summer 2019 – What’s New?

Talend continues to revolutionize how businesses leverage speed and manage scale

Watch Now

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog