Talend PIG component set to merge 2 flows into one

Five Stars

Talend PIG component set to merge 2 flows into one

I have a requirement where in I need to merge 2 different input flows into a single flow. Following is the example:
tPigLoad  ->  tPigMap(Transformation)  
                                                    Merge flows    -> tPigAggregate  -> tPigStoreResult
tPigLoad  ->  tPigMap(Transformation)
Note that input schema of both tPigMap are different but output of both of them are same. 
I don't want to split the job by writing to the disk. Can anyone suggest me how can I acheive this. 
Seven Stars

Re: Talend PIG component set to merge 2 flows into one

If they are relations and not bags and the schemas match, use a tPigCode to UNION the two relations. If they are bags there is a UDF to concatentate bags.
Five Stars

Re: Talend PIG component set to merge 2 flows into one

Thanks for the reply.
The issue is that I can not connect 2 inputs to tPigCode to use the union operator. 
Any thoughts ?
Seven Stars

Re: Talend PIG component set to merge 2 flows into one

Yeah, I see what you mean now. It's a little funky how they did these components. Looks like the PigServer is instantiated in the tPigLoad component, and the only way to allow multiple loads is with the tPigMap. I think you should be able to just use the Map as a way to combine loads. You may need to map the flows inside in order to make sure the script that is generated is correct, I'm not sure since I haven't tested it, but I think you can do something like in the screenshot.
Five Stars

Re: Talend PIG component set to merge 2 flows into one

I have tried this option. When you connect 2 inputs to the tPigMap component, it is treated as a join and thus we will have to provide a key to join the 2 flows. However joining the data will not serve my purpose. 
The code in tPigCode is fine, but the tPigMap before it is an issue I feel. 
Seven Stars

Re: Talend PIG component set to merge 2 flows into one

Don't do a join, just map the flows straight across (or even try with no mapping at all) we are just using the Map to allow us to use two LOAD operations in one script and aren't actually using the relations created from the Map.
Five Stars

Re: Talend PIG component set to merge 2 flows into one

Thanks a lot for the reply. 
I tried with default or even with no mappings. The job does not compile. Following is the error. 
Failed to parse: can't look backwards more than one token in this stream.
The error goes away when we specify the join key.
Can you help me understand the configuration by attaching a sample job or screen shot as you did previously. 

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now