Deployment options for big data streaming job

Four Stars

Deployment options for big data streaming job



I have managed to build my first big data streaming job that consumes a kineses stream. I have installed a jobserver on an aws emr cluster and I am able to successfully deploy and run the job on that job server. 


My only concern is that we would need an emr cluster running 24/7 just for this one job. Is there any other ways of  deploying / "productionizing" a big data streaming job without running a whole cluster just for that?


Re: Deployment options for big data streaming job


This is the whole purpose of a streaming processing running on top of a big data cluster.

If you do not need such computation power, could you please check 4 options :

- Set the spark configuration to run locally. It will only require an EC2 instance where the jobserver is deployed.
- Use Talend ESB / Camel
- Leverage the latest 7.0 feature with Cloudera Altus distribution (acting as Hadoop as a service)
- Leverage the new serverless distribution we shared on Talend Marketplace based on Qubole  Saas offering (Hadoop as a service too).

Let us know if it is what you are looking for.

Best regards


Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.