Am using talend big data 6.3.1 enterprise version . in repository we have 2 job designs : 1)standard 2)big data batch ,
we designed jobs in big data batch but we are not able to use tcontextload component.
is that this component cant be used because when i placed the component in work space its showing as missing.please find the screenshot for the same.
we are able to use the tcontextload component in standard job ,but not in big data batch. we are using spark.
Solved! Go to Solution.
tContextLoad is available in Standard ETL only.
In Spark Batch and Spark Streaming, you need to think and design differently due to the multi-parallel processing of these frameworks. You will use context variables only to initialise the job. And the initialisation should happen as soon as the job begins, meaning these variable values should be passed to the job from the TAC. This is because a Spark Batch or Spark Streaming job will run on many nodes, and will have many executor context. Hence, you cannot rely on context variable, because it is not global to all the executor context anymore. Once some spark logic start executing in an executor context, any attempt to manipulate the context variable will be local to that executor context only, and not the whole job. That's why we do not provide these components in the Spark Batch and Spark Streaming so as to avoid Talend developers from using an antipattern.
You need to figure out how to read variables. An example can be to read it from a database and load the values into an RDD. You can then access the RDD from all your nodes. It will be in memory and fast. You can then read from the RDD and apply the value to globalmap etc. We have not abstracted this logic yet as it creates an RDD. Since RDD is immutable, each time you update a variable value in the RDD, you will create another RDD.
In the same line of thought, you cannot use AMC in Big Data Batch/Streaming jobs.
Hope that helps.