I have managed to build my first big data streaming job that consumes a kineses stream. I have installed a jobserver on an aws emr cluster and I am able to successfully deploy and run the job on that job server.
My only concern is that we would need an emr cluster running 24/7 just for this one job. Is there any other ways of deploying / "productionizing" a big data streaming job without running a whole cluster just for that?
This is the whole purpose of a streaming processing running on top of a big data cluster.
If you do not need such computation power, could you please check 4 options :
- Set the spark configuration to run locally. It will only require an EC2 instance where the jobserver is deployed.
- Use Talend ESB / Camel
- Leverage the latest 7.0 feature with Cloudera Altus distribution (acting as Hadoop as a service)
- Leverage the new serverless distribution we shared on Talend Marketplace based on Qubole Saas offering (Hadoop as a service too).
Let us know if it is what you are looking for.