Six Stars

Using Custom Code in tJavarow

Hi,

 

Is it possible to write custom java code in tJavarow  for Big data jobs.

I am using tJavarow inside a spark job.

but the custom code is not executed.

The following is the custom code

context.Flag="YES";

System.out.println("###################### This output is from tJavaRow ###################");

the value is not assigned as well as the message is not printed.

 

Thanks

 

 

10 REPLIES
Ten Stars

Re: Using Custom Code in tJavarow

not sure about something more serious, but as described - yes it work 

 

variables - not assigned

 

Screen Shot 2017-06-07 at 10.06.00 AM.pngScreen Shot 2017-06-07 at 10.06.08 AM.png

-----------
Six Stars

Re: Using Custom Code in tJavarow

Hi ,

 

is that a talend bigdata spark job?

If so , instead of running as local, run it in a spark cluster.

 

I am running my spark job in the cluster.

if i put custom in tJava instead of tJavaRow, its getting executed.

I am able to see the output in spark application logs.

but custom code in tJavarow is not getting executed.

 

Thanks

Ten Stars

Re: Using Custom Code in tJavarow

yes, will test, but may be You are right - it will not work 

 

let wait - what Talend staff answer :-)

-----------
Six Stars

Re: Using Custom Code in tJavarow

yeah..lets wait..Smiley Happy

Five Stars

Re: Using Custom Code in tJavarow

Hello,

 

Custom code components (tJava and tJavaRow) behave and have to be used differently depending on what type of job you are building.  For instance, Spark batch jobs you need to write with Spark Java API syntax to work with the input and output RDD (read the comments in the component when you first add it for help on how to do a test print on your input RDD, try that instead of your system.out*).  In Spark streaming job, you'll be working with RDD in Dstream.  tJava and tJavaRow behave differently too which tJavaRow uses Spark DataFrames API and tJava is purely working with RDDs.

 

See the documentation for the differences between them when using across various types of jobs:

https://help.talend.com/reader/KxVIhxtXBBFymmkkWJ~O4Q/y0Us7J_ukdgxhe9Jx_o_NQ?section=sect-components...

 

Hope that helps.

Five Stars

Re: Using Custom Code in tJavarow

Hi jpmauss,

 

Could you please provide an example like wordcount or something on how to write custom spark code in tjava/tjavarow ,I tried doing the same by reading the description in the component but was not successful.Could not find any example in knowledge base.

Any help will be much appreciated.

 

Best Regards,

Ojasvi Gambhir

Six Stars

Re: Using Custom Code in tJavarow

Hi ALL,

 

if you find any materials to write custom java code in tJava for bigdata version. Please let me know.

 

Thanks

Five Stars

Re: Using Custom Code in tJavarow

I can look to share some examples, however you'd need to be familiar with Spark Java API which is different than straight java like in the standard jobs. Also with tjavarow you'd need to be familiar with Spark SQL and dataframes API.

 

See this link for intro to programming in Spark, click the Java tab to see how to work with the data using RDDs:

https://spark.apache.org/docs/1.6.2/programming-guide.html

 

See this link for intro to Spark SQL and DataFrames API:

https://spark.apache.org/docs/1.6.2/sql-programming-guide.html

 

When working in Talend, the tInput(whatever) creates an RDD that is to be used in the tJava. See the 'code' tab in studio for how it initializes the spark context and loads the data to an RDD.

Employee

Re: Using Custom Code in tJavarow

Hi,

these is a example code for tJava with Spark job. the code  sample of the component is wrong  ( Talend 6.4.1)

 

in the basic setting :

 

outputrdd_tJava_1 = rdd_tJava_1.map(new mapInToOut(job)).

in the advanced setting, in class  java field

	public static class mapInToOut
			implements
			org.apache.spark.api.java.function.Function<inputStruct, RecordOut_tJava_1> {

		private ContextProperties context = null;
        private java.util.List<org.apache.avro.Schema.Field> fieldsList;
		
		public mapInToOut(JobConf job) {
			this.context = new ContextProperties(job);
		}
		
		@Override
		public RecordOut_tJava_1 call(inputStruct origStruct) {		
			
		if (fieldsList == null) {
				this.fieldsList = (new inputStruct()).getSchema()
						.getFields();
			}

			RecordOut_tJava_1 value = new RecordOut_tJava_1();

			for (org.apache.avro.Schema.Field field : fieldsList) {
				value.put(field.pos(), origStruct.get(field.pos()));
			}

			return value;		
			
		}
	}

 

Five Stars

Re: Using Custom Code in tJavarow

Thanks for the post.  Any suggestions for working with multiple datasets within one tJava?  Meaning, two different tables from two different tInput components.  Today I put one to a dummy tJava and put the order as that one gets loaded first so then I can just call that rdd in my second tJava.