Multiple tHashOuts to 1 tHashIn

Seven Stars

Multiple tHashOuts to 1 tHashIn

I have a number of files in 1 folder that have slightly different schemas. I want to standardise the schema before processing all rows as flow. I will need to extend the design to include at least 2 more schema designs not shown here.

 

I'm having trouble designing the job to unite the flows into 1 while avoiding loops (tUnite doesn't work) so tHash components were suggested but my design as seen below only outputs the rows from 1 of the 4 tHashOut components (whichever is listed as linked in the tHashInput). 

 

TalendMultipleTHash.png

 

 

What I've tried

  • Adding more tHashInputs and using tUnite to unite the flows but thats creates an error
  • Linking multiple tHashInputs to 1 and pairing that tHashInput to the tHashOutput but I get Null pointer exception
  • disabling append data but then I get even less rows.

While this reference guide shows 2 HashOuts its not clear to me why the configuration works or how I would extend it to include more tHashOutputs

  1. How do I configure multiple tHashOutputs to feed 1 tHashInput?
  2. In the tHashOutput component
    1. Does keys management mean the key defined in the schema (and can I have more than 1 key defined in a schema)
    2. What does link to tHashOutput actually mean?
  3. In the tHashInput component
    1. Can I link to more than 1 Output as the drop down list only lets me select 1

Anyone able to advise me ?


Accepted Solutions
Highlighted
Sixteen Stars

Re: Multiple tHashOuts to 1 tHashIn

The way to do this is create a mini subjob at the beginning of your job. In that subjob add a tFixedFlowInput with the schema you require for your files and connect that to a tHashOutput. Set the tFixedFlowInput to produce 0 rows. This will initialise the tHashOutput. 

 

Now in all of your other tHashOutputs tick the "Link with a tHashOutput" and select the name of the pre-initialised tHashOutput. 

 

When you want to read from the tHashOutputs connect your tHashInput to the pre-initialised one. 

 

That is it. 

 


All Replies
Highlighted
Sixteen Stars

Re: Multiple tHashOuts to 1 tHashIn

The way to do this is create a mini subjob at the beginning of your job. In that subjob add a tFixedFlowInput with the schema you require for your files and connect that to a tHashOutput. Set the tFixedFlowInput to produce 0 rows. This will initialise the tHashOutput. 

 

Now in all of your other tHashOutputs tick the "Link with a tHashOutput" and select the name of the pre-initialised tHashOutput. 

 

When you want to read from the tHashOutputs connect your tHashInput to the pre-initialised one. 

 

That is it. 

 

Seven Stars

Re: Multiple tHashOuts to 1 tHashIn

Thanks thats what I needed, nice trick as well.
Its sometimes not clear "Why" or "How" a setting on a component can be used. As an example The talend help text just confirms "link to a tHashOutput" but does not elaborate as to why you might use or need that

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.