Four Stars

Equivalent left join on RDD

I'm developing a Talend Big Data Streaming Job that read data from a Kafka stream and join them with another stream coming from a database.

Is it possible using a talend component to perform a Spark join, ie the one that uses https://spark.apache.org/docs/1.2.1/api/java/org/apache/spark/api/java/JavaPairRDD.html#leftOuterJoi...??

The code that should generate is as follows:

 

 rdd1.leftOuterJoin(rdd2);

where rdd1 and rdd2 are JavaPairRDD class.

 

To implement the left join i've used a tMap with a tMySqlLookupInput, but this doesn't generate the spark code I expected.

Tags (2)
1 REPLY
Five Stars

Re: Equivalent left join on RDD

I can't exactly speak to that streaming job use case, I've only done one input there. For batch spark jobs I've seen Talend generate poor Spark code which makes sense as it can only infer so much about the data and structure. I've created tJava's off each source, one of them a dummy.  Even though it is a dummy tJava, it initializes the data to a dstream in your case. Then in the other tJava you can call that dstream or rdd.  Doing this in a streaming job may be tricky but it is what I've done in batch jobs to write with the Java API code and work with multiple sources.