I am new to Talend and Talend Data Streams and read all the documentation online to do a POC. I have the following questions-
Welcome to Talend community!
So Talend Data Streams is powered by Apache Beam and it provides a GUI way to build pipelines easily. You will be able to process data with different frameworks, and the first couple options that will be available are Spark through AWS, EMR, and Google DataFlow. There is no need to install additional Apache Beam runner. A particular Talend Data Streams remote engine is needed, so different from your current Talend job server.
There is a free version of Data Streams now available through AWS Marketplace: https://aws.amazon.com/marketplace/pp/B07C4WYPFM It is a free edition of the product, not the enterprise version. But there is no software cost only EC2 instance cost, which you can choose to turn on & off based on your needs.
The enterprise version of Data Streams will be part of Talend Cloud platform, and will be released later this year in Q4.
Hope this helps.
Just to add some comments on top of Shiyi's post: Talend Data Streams is indeed powered by Apache Beam, which provides an unprecedented level of portability to data pipelines.
We contribute pretty actively to that top level Apache project, with many Talend employees involved.
The underlying runners that we use are the native Apache Beam Runners. SparkRunner is the first one we deliver (running Spark Local, or Spark on Yarn with a distribution), as well as the Google Cloud Dataflow Runner. And more to come in the future
Those Apache Beam runners are abstracted under our concept of "Run Profile", enabling operations to provide easy-to-use configurations/resource allocations for the developer, that can use it at runtime (and also switch from one to another).
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.