Talend PIG component set to merge 2 flows into one

Five Stars

Talend PIG component set to merge 2 flows into one

I have a requirement where in I need to merge 2 different input flows into a single flow. Following is the example:
tPigLoad  ->  tPigMap(Transformation)  
                                                    Merge flows    -> tPigAggregate  -> tPigStoreResult
tPigLoad  ->  tPigMap(Transformation)
Note that input schema of both tPigMap are different but output of both of them are same. 
I don't want to split the job by writing to the disk. Can anyone suggest me how can I acheive this. 
Six Stars

Re: Talend PIG component set to merge 2 flows into one

If they are relations and not bags and the schemas match, use a tPigCode to UNION the two relations. If they are bags there is a UDF to concatentate bags.
Five Stars

Re: Talend PIG component set to merge 2 flows into one

Thanks for the reply.
The issue is that I can not connect 2 inputs to tPigCode to use the union operator. 
Any thoughts ?
Six Stars

Re: Talend PIG component set to merge 2 flows into one

Yeah, I see what you mean now. It's a little funky how they did these components. Looks like the PigServer is instantiated in the tPigLoad component, and the only way to allow multiple loads is with the tPigMap. I think you should be able to just use the Map as a way to combine loads. You may need to map the flows inside in order to make sure the script that is generated is correct, I'm not sure since I haven't tested it, but I think you can do something like in the screenshot.
Five Stars

Re: Talend PIG component set to merge 2 flows into one

I have tried this option. When you connect 2 inputs to the tPigMap component, it is treated as a join and thus we will have to provide a key to join the 2 flows. However joining the data will not serve my purpose. 
The code in tPigCode is fine, but the tPigMap before it is an issue I feel. 
Six Stars

Re: Talend PIG component set to merge 2 flows into one

Don't do a join, just map the flows straight across (or even try with no mapping at all) we are just using the Map to allow us to use two LOAD operations in one script and aren't actually using the relations created from the Map.
Five Stars

Re: Talend PIG component set to merge 2 flows into one

Thanks a lot for the reply. 
I tried with default or even with no mappings. The job does not compile. Following is the error. 
Failed to parse: can't look backwards more than one token in this stream.
The error goes away when we specify the join key.
Can you help me understand the configuration by attaching a sample job or screen shot as you did previously.