Four Stars

import data (csv or excel etc) into apache Spark

Hi,

 

I am new to Talend.

But I have used similar etl tools from Pentaho.

 

I want to use Talend to import say a CSV file(large one) into the distributed file system of Spark(RDD) and or Hadoop(HDFS).

I can import this via commands at the command line, BUT i really want to use a GUI based tool instead.

 

I hope that someone can let me know if Talend can do this.

I could not find any simple tutorials on this.

 

Hope someone can help on this topic.

 

Paluee

  • Data Integration
5 REPLIES
Moderator

Re: import data (csv or excel etc) into apache Spark

Hello,

Here is a tHDFSOutput component which is used to write data flows it receives into a given Hadoop distributed file system (HDFS).

Please take a look at component reference:TalendHelpCenter:tHDFSOutput

Best regards

Sabrina

 

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: import data (csv or excel etc) into apache Spark

OK,

 

Thanks for this.

Is this component in the Talend Open Studio.

Do they have a similar component for Apache Spark?

Moderator

Re: import data (csv or excel etc) into apache Spark

Hello,

The tHDFSOutput component in this framework is available when you are using one of the Talend solutions with Big Data. Talend open studio for bigdata.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: import data (csv or excel etc) into apache Spark

Hi there,

Thanks for your reply.

I actually discovered this fact that : Talend open studio for bigdata., has the components for Hadoop,

just a little while ago before your reply here.

 

And then next to it, it showed that for Spark, its seems that there is a component for Spark but it is not in the free version but in the paid version.

Can you validate that this is the case.

 

Regards,

 

P

 

Moderator

Re: import data (csv or excel etc) into apache Spark

Hello,

 Batch Processing (MapReduce, Spark), Native Hadoop Connectors and Real-Time Processing (Spark Streaming) are available in Talend subscription version not open source.

Please take a look at bigdata product page:http://www.talend.com/products/big-data/

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.