I have built a standard job which requires some intermediate results to be combined and stored in a database. Therefore I have used the combination of thashoutput and thashinput. However, when I changed the standard job to Spark job, the thashoutput/input could not be loaded. Can anyone suggest which component(s) can use to replace thashoutput/input in Spark framework?
Thanks in advanced.
So far, Cache components, tCacheIn and tCacheOut can be available in the Spark Batch and Spark Streaming Job framework.
I notice that, unlike tHashOutput, tCacheOuput does not have the option to "Link with a tCacheOuput". Is it possible to append intermediate results with the same schema from different subjobs by using tCacheOuput and set the Storage Level to 'Memory only'?
Thanks in advance.