I'm developing a Talend Big Data Streaming Job that read data from a Kafka stream and join them with another stream coming from a database.
Is it possible using a talend component to perform a Spark join, ie the one that uses https://spark.apache.org/docs/1.2.1/api/java/org/apache/spark/api/java/JavaPairRDD.html#leftOuterJoi...??
The code that should generate is as follows:
where rdd1 and rdd2 are JavaPairRDD class.
To implement the left join i've used a tMap with a tMySqlLookupInput, but this doesn't generate the spark code I expected.
I can't exactly speak to that streaming job use case, I've only done one input there. For batch spark jobs I've seen Talend generate poor Spark code which makes sense as it can only infer so much about the data and structure. I've created tJava's off each source, one of them a dummy. Even though it is a dummy tJava, it initializes the data to a dstream in your case. Then in the other tJava you can call that dstream or rdd. Doing this in a streaming job may be tricky but it is what I've done in batch jobs to write with the Java API code and work with multiple sources.
Watch the recorded webinar!
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.
Test drive Talend's enterprise products.