Six Stars

Big Data Spark job - Inner join

Hi,

 

I have to create big data spark job and have two dataset

Fact

Col1Col2Col3
101ABC123
102All234
103XYZ345

 

 Dim

Col1Col2Col3
1001XYZ345
1002WED234
1003WER123

 

and need to join above tables based on Col2, Col3. Wherever "All" is there like in Fact table, row 2 where Col2="All", then we have to consider that field for all values. Output would be for above dataset

 

Col1Col2Col3Col1Col2Col3
1001XYZ345103XYZ345
1002WED234102All234

 

How to handle this scenario? Confused....

  • Big Data
  • Data Integration
  • Talend Integration Cloud
Tags (1)
3 REPLIES
Eight Stars

Re: Big Data Spark job - Inner join

In a database query, you would probably do this with a union-- that is, a two step process and then combining the results. One lookup joining on both Col2 and Col3, and one lookup on just Col3 using only the "All" records.
Six Stars

Re: Big Data Spark job - Inner join

not sure what u r saying.. Can you explain it more?

Eight Stars

Re: Big Data Spark job - Inner join

Do two lookups. One lookup on both Col2 and Col3, and the second on just Col3 but only for fact records where Col2 = "All"