I am new to Talend.
But I have used similar etl tools from Pentaho.
I want to use Talend to import say a CSV file(large one) into the distributed file system of Spark(RDD) and or Hadoop(HDFS).
I can import this via commands at the command line, BUT i really want to use a GUI based tool instead.
I hope that someone can let me know if Talend can do this.
I could not find any simple tutorials on this.
Hope someone can help on this topic.
Here is a tHDFSOutput component which is used to write data flows it receives into a given Hadoop distributed file system (HDFS).
Please take a look at component reference:TalendHelpCenter:tHDFSOutput
Thanks for this.
Is this component in the Talend Open Studio.
Do they have a similar component for Apache Spark?
The tHDFSOutput component in this framework is available when you are using one of the Talend solutions with Big Data. Talend open studio for bigdata.
Thanks for your reply.
I actually discovered this fact that : Talend open studio for bigdata., has the components for Hadoop,
just a little while ago before your reply here.
And then next to it, it showed that for Spark, its seems that there is a component for Spark but it is not in the free version but in the paid version.
Can you validate that this is the case.
Batch Processing (MapReduce, Spark), Native Hadoop Connectors and Real-Time Processing (Spark Streaming) are available in Talend subscription version not open source.
Please take a look at bigdata product page:http://www.talend.com/products/big-data/