Six Stars

Big Data Spark Job - Subjob and Logs creation

Hi,

 

I am creating big data spark job. I want to create one sub job and want to use into another job. Like standard job, we can use tBufferOutput. Which component we can use to create sub job?

 

I want to maintain the log as well for spark big data job. I am using tWarn -> tLogCatcher -> tLogRow -> tFileOutputDelimited in standard job. Which component I can use for big data spark job?

 

Thanks.

  • Big Data
  • Data Integration
  • ESB
  • Talend Integration Cloud
Tags (2)
11 REPLIES
Moderator

Re: Big Data Spark Job - Subjob and Logs creation

Hi,

The DI tRunJob component can work with spark batch job.

Here is a "Log4jLevel" option in Advanced settings of Run view, which will output component-related logging information at runtime. Let us know if it is Ok with you.

Best regards

Sabrina

1.png

Six Stars

Re: Big Data Spark Job - Subjob and Logs creation

 

 

Thanks for the reply.. Actually, I want customised message to print on log files. Is it possible with big data spark job? If yes, how?

 

I am trying to create subjob using tCacheOut and tChachein but getting error message like 

 

17/04/18 15:10:38 INFO SparkContext: Successfully stopped SparkContext
java.lang.NullPointerException
	at org.talend.bigdata.dataflow.spark.batch.hmap.SparkHMapTransform.build(SparkHMapTransform.java:52)
tFoundException: /etc/spark/conf/fairscheduler.xml (No such file or directory)
	at java.io.FileInputStream.open(Native Method)

but that file config file is present in the location. PFA job

job.jpg

 

I also connected tCacheout -> (on Component ok) tCacheIn but it didn't generate any output, not even the output folder.

 

There is no null in the data set, I am doing sync data set with every component. Really confused, what went wrong. Not much help is there for tCache in/out.

 

Please let me know, is there any issue with the workflow? Am I using these component correctly?

Moderator

Re: Big Data Spark Job - Subjob and Logs creation

Hi,

For your issue, could you please use the connection type "OnSubjobOK" instead to see if this issue still repro?

Let us know if it is Ok with you.

Best regards

Sabrina

Six Stars

Re: Big Data Spark Job - Subjob and Logs creation

Hi,

 

I tried OnComponentOK, job ran successfully but it does not generate any output. Have you tried any simple workflow with tCache in/out. 

 

job.jpg

 

 

OnSubjobOK is not there to connect cache out to cache in.

 

Please help.

Moderator

Re: Big Data Spark Job - Subjob and Logs creation

Hi,

Please design your work flow like:

tfileinput-->tCacheOutput

| onsubjobok

tCacheInput-->tfileoutput

Best regards

Sabrina

Six Stars

Re: Big Data Spark Job - Subjob and Logs creation

Thanks.. Worked for me. How we decide whether to use onSubJobOk or onComponentOK?

 

Can we create customised log with bigdata job.

 

 

Moderator

Re: Big Data Spark Job - Subjob and Logs creation

Hi,

Please refer to this article about the difference between OnSubjobOk and OnComponentOk.

https://help.talend.com/pages/viewpage.action?pageId=190513190

What does your customised log look like? The log4jLevel cannot meet your needs?

Best regards

Sabrina

Six Stars

Re: Big Data Spark Job - Subjob and Logs creation

I do not know much about log4j. If I enable this option, where I can see this logs? How I can store this in log or output file?

 

As far as customised message is concerend, it would be like -

Execution started at <datetime>

File loaded at <datetime>

filteration done, <n> row flown to next level at <datetime>

and so on....

 

Thanks...

Tags (1)
Moderator

Re: Big Data Spark Job - Subjob and Logs creation

Hi,

log4jLevel,this feature allows you to change the output level at runtime for log4j loggers activated in components in the Job.

For more information, please see:https://help.talend.com/display/TalendDataFabricStudioUserGuide63EN/7.9.5+How+to+customize+log4j+out...

Best regards

Sabrina

Six Stars

Re: Big Data Spark Job - Subjob and Logs creation

Thanks.... How I can save log4j output in a file with big data spark job?

Moderator

Re: Big Data Spark Job - Subjob and Logs creation

Hi,

Please take a look at a custom tRedirectOutput component, which redirects all the console output to a file.

https://exchange.talend.com/#marketplaceproductoverview:marketplace=marketplace%252F1&p=marketplace%...

Best regards

Sabrina