Talend Data Streams runtime architecture

Four Stars

Talend Data Streams runtime architecture

Hi! 

 

I am new to Talend and Talend Data Streams and read all the documentation online to do a POC. I have the following questions-

  1. Is there a enterprise version of Talend Data Streams ? If not when will be released ?
  2. Is Talend Data Stream based on Apache Beam? 
  3. Is Talend Data Stream one of the Apache Beam runners like Google Cloud Data flow or Apache Flink ? Or does it just give you an easy UI to create Apache Beam based data pipelines and an additional Apache beam runner will be required to run the data streams ?
  4. Can Talend job servers be used to run Talend Data Streams ?
Employee

Re: Talend Data Streams runtime architecture

Hi @rachitbahl

 

Welcome to Talend community!

 

So Talend Data Streams is powered by Apache Beam and it provides a GUI way to build pipelines easily. You will be able to process data with different frameworks, and the first couple options that will be available are Spark through AWS, EMR, and Google DataFlow. There is no need to install additional Apache Beam runner. A particular Talend Data Streams remote engine is needed, so different from your current Talend job server. 

 

There is a free version of Data Streams now available through AWS Marketplace: https://aws.amazon.com/marketplace/pp/B07C4WYPFM  It is a free edition of the product, not the enterprise version. But there is no software cost only EC2 instance cost, which you can choose to turn on & off based on your needs. 

 

The enterprise version of Data Streams will be part of Talend Cloud platform, and will be released later this year in Q4. 

 

Hope this helps.

 

Employee

Re: Talend Data Streams runtime architecture

Hi rachitbahl,

Just to add some comments on top of Shiyi's post: Talend Data Streams is indeed powered by Apache Beam, which provides an unprecedented level of portability to data pipelines.
We contribute pretty actively to that top level Apache project, with many Talend employees involved.

The underlying runners that we use are the native Apache Beam Runners. SparkRunner is the first one we deliver (running Spark Local, or Spark on Yarn with a distribution), as well as the Google Cloud Dataflow Runner. And more to come in the future Smiley Happy

Those Apache Beam runners are abstracted under our concept of "Run Profile", enabling operations to provide easy-to-use configurations/resource allocations for the developer, that can use it at runtime (and also switch from one to another).

Regards,

Cyril.