Six Stars

Big Data Spark job - Inner join

Hi,

 

I have to create big data spark job and have two dataset

Fact

Col1 Col2 Col3
101 ABC 123
102 All 234
103 XYZ 345

 

 Dim

Col1 Col2 Col3
1001 XYZ 345
1002 WED 234
1003 WER 123

 

and need to join above tables based on Col2, Col3. Wherever "All" is there like in Fact table, row 2 where Col2="All", then we have to consider that field for all values. Output would be for above dataset

 

Col1 Col2 Col3 Col1 Col2 Col3
1001 XYZ 345 103 XYZ 345
1002 WED 234 102 All 234

 

How to handle this scenario? Confused....

Tags (1)
3 REPLIES
Ten Stars

Re: Big Data Spark job - Inner join

In a database query, you would probably do this with a union-- that is, a two step process and then combining the results. One lookup joining on both Col2 and Col3, and one lookup on just Col3 using only the "All" records.
Six Stars

Re: Big Data Spark job - Inner join

not sure what u r saying.. Can you explain it more?

Ten Stars

Re: Big Data Spark job - Inner join

Do two lookups. One lookup on both Col2 and Col3, and the second on just Col3 but only for fact records where Col2 = "All"