which is the best way to perform lookup operation in spark batch job?

Four Stars

which is the best way to perform lookup operation in spark batch job?

Hi,

 

I want to perform lookup operation (defined below) in spark batch job.

Lookup operation:

My main input flow file lets say 'ABC' have columns U,V,W

I have another file(lookup file) say 'DEF' say columns .X,Y,Z

My logic should check like if(X=="APPLE") then get 'Y' value and populate to "V" else populate 'Null'

is explained more in the link:

https://community.talend.com/t5/Design-and-Development/Calling-a-user-routine-to-lookup-a-file-is-ta...

 

In Data integration job i had stored the data in hash map and written a custom java function to fetch values on the basis of keys. Now by considering spark distributed framework, what is the best way to achieve lookup operation (hashmap or rdd or pair rdd or data frame or etc.,)? Also if possible, please elaborate on why is it a best option..

 

Appreciate your suggestion/help. 

Moderator

Re: which is the best way to perform lookup operation in spark batch job?

Hello,

tCacheIn and tCacheOut can be available in the Spark Batch and Spark Streaming Job framework.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: which is the best way to perform lookup operation in spark batch job?

Hi ,
Thanj you for the reply.
I agree tcache components are helpful in a connected lookup scenario.
But for my use case it needs an unconnected lookup. For DI job I have achieved through a udf using hashmaps.
In spark batch job how would I achieve the same thing ?

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now