I am creating big data spark job. I want to create one sub job and want to use into another job. Like standard job, we can use tBufferOutput. Which component we can use to create sub job?
I want to maintain the log as well for spark big data job. I am using tWarn -> tLogCatcher -> tLogRow -> tFileOutputDelimited in standard job. Which component I can use for big data spark job?
The DI tRunJob component can work with spark batch job.
Here is a "Log4jLevel" option in Advanced settings of Run view, which will output component-related logging information at runtime. Let us know if it is Ok with you.
Thanks for the reply.. Actually, I want customised message to print on log files. Is it possible with big data spark job? If yes, how?
I am trying to create subjob using tCacheOut and tChachein but getting error message like
17/04/18 15:10:38 INFO SparkContext: Successfully stopped SparkContext java.lang.NullPointerException at org.talend.bigdata.dataflow.spark.batch.hmap.Spark
tFoundException: /etc/spark/conf/fairscheduler.xml (No such file or directory) at java.io.FileInputStream.open(Native Method)
but that file config file is present in the location. PFA job
I also connected tCacheout -> (on Component ok) tCacheIn but it didn't generate any output, not even the output folder.
There is no null in the data set, I am doing sync data set with every component. Really confused, what went wrong. Not much help is there for tCache in/out.
Please let me know, is there any issue with the workflow? Am I using these component correctly?
For your issue, could you please use the connection type "OnSubjobOK" instead to see if this issue still repro?
Let us know if it is Ok with you.
I tried OnComponentOK, job ran successfully but it does not generate any output. Have you tried any simple workflow with tCache in/out.
OnSubjobOK is not there to connect cache out to cache in.
Please refer to this article about the difference between OnSubjobOk and OnComponentOk.
What does your customised log look like? The log4jLevel cannot meet your needs?
I do not know much about log4j. If I enable this option, where I can see this logs? How I can store this in log or output file?
As far as customised message is concerend, it would be like -
Execution started at <datetime>
File loaded at <datetime>
filteration done, <n> row flown to next level at <datetime>
and so on....
log4jLevel,this feature allows you to change the output level at runtime for log4j loggers activated in components in the Job.
For more information, please see:https://help.talend.com/display/TalendDataFabricSt
Please take a look at a custom tRedirectOutput component, which redirects all the console output to a file.