One Star

Is there any possibility to process bunch of records with tJavaFlex

Hi,
I have file of 10k records,In talend DataIntegration defined tJavaFlex component with FF as source with tFileInputRegEx and target as Flatfile with tFileOutputDelimited. For external jar used tLibraryLoad.
I prepared javaFlex as -
 'row1.LogRowIn' at start code it returns 0, but if i use 'row1.LogRowIn;' at main code it gives single record and can process on that record and set result to output row.
Bellow is the code at main block
String record = row1.LogRowIn;
String outRecord = externalJarOperation.process(record);
row2.LogRowIn=outRecord;
Here I found each record processed with jar and gets output and writes to output row, problem i got was timecomplexity because each record went to tJavaFlex interacts with external jar, getting the output and send it to output.
***Is there any possibility to process some bunch of records.
Ex: File has 1000 records, process bunch of 100 records at a time and write them to output file, process should be repetative until EOF reached. 
Question: Is there any possibility to process bunch of records with tJavaFlex (process bunch of records at a time and write them to output file, process should be repetative until EOF reached.) ?
12 REPLIES
Community Manager

Re: Is there any possibility to process bunch of records with tJavaFlex

Hi 
tJavaFlex process each record one by one, this is because the data is transfer from the source component one by one.
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Is there any possibility to process bunch of records with tJavaFlex

Hi Shong,
Thanks for you reply.
I did java transformations in Informatica PowerCenter, IBM DataStage (through rt.jar).
In Information I used java transformer and did following operations:- 
1) 'On Input Row' block - Wrote Java code to get chunk of records(assume 100000) into ArrayList (Using inputPorts)
2) 'On End of the line' block - Then Processed the AL through external jar 
3) But Process willbe continued until EOF reached (I didn't write any java snippet, it would be Informatica Java transformation builtin feature, i guess)
---------------Same I want to apply on Talend DI 
On Start code I tried to get chunk of records to ArrayList ( But 'row1.key' returns null)
Then I used 'main code' block to take chunk of records... Finally through End Code i processed chunk of records with external jar and loading into source as 'row2.key' with transformer.
But problem is it's not reading records until EOF reached.
--------------My only concern is to fast up the java process through AL by interacting with external API's instead of interacting by Line by Line..Is there any chance/method/suggestion to achieve this?
Thanks,
Praneeth  
Community Manager

Re: Is there any possibility to process bunch of records with tJavaFlex

Hi 
You can try to do the same things on tJavaFlex like:
in the begin code:
java.util.List list=new java.util.ArrayList();
in the main code: (assuming there is a column called ID that counts the number of line)
if(row1.ID%100000== 0){
externalJarOperation.process(list); //pass list as input parameter to ArrayList;
list.clear();
}else{
list.add(row1.columnName);
}

in the end code: (call external function to process the last lines of record)
externalJarOperation.process(list);

Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Is there any possibility to process bunch of records with tJavaFlex

Hi Shong,
Thanks for reply. 
In above, you mentioned data processing based on ID. But in my code reading process going through record wise. 
So How can we manage ID(RowNum), if we don't have ID column? (or) Is there any possibility to process records into ArrayList without ID? (Or) Can we manage ID column separately in any transformation?
Community Manager

Re: Is there any possibility to process bunch of records with tJavaFlex

Before tJavaFlex,  you can add an external ID number for each record at runtime on tMap, just use this ID to count the number of lines. To generate a sequence number for ID column, you can use built-in function Numeric.sequence("s1",1,1) 
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Is there any possibility to process bunch of records with tJavaFlex

Hi Shong,
Thanks for giving the great suggestion. I followed the tMap,  tJavaFlex (start code, Main code, End code) rules as you mentioned in previous replies.
Then My Job flow is tLibrary-->tInputFile(Rec)-->tMap(ID,Rec)--->tJavaFlex-->tOutputFile(operational data)
This is my code:
#####Start code
List<String> list=new ArrayList<String>();

#####Main Code
if(out1.ID%chunkSize== 0){
System.out.println("list is prepared");
ArrayList<String> AL= ExternalJarOperation(list);
for(String s:AL)
{
System.out.println("Out record is:"+s);
row2.LogRowOut=s;
}
list.clear();
System.out.println("******************");
}else{
list.add(out1.LogRowIn);
}

#####End Code
Last records processing
-----------------------------------------------------------------
In main code row2.LogRowOut=s is writing only the last record to the output file, and for the rest of the records it is writing empty lines.
Please help me to sort out this problem.
Thanks in advance.
Community Manager

Re: Is there any possibility to process bunch of records with tJavaFlex

Hi 
for(String s:AL)
{
System.out.println("Out record is:"+s);
row2.LogRowOut=s;
}

It only outputs the last value to row2.LogRowOut, it is normal. We know that the component process each row one by one. These code collect a bunch of records and call external function to process it when the number reads the chunkSize. 
Is it possible to modify your function ExternalJarOperation to process the record and write the data to file in the function?
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Is there any possibility to process bunch of records with tJavaFlex

Hi Shong,
Thanks for reply, I want to share my thought from following image. Please see the image
Is this possible- if it's possible what component i should use and what should change.  
One Star

Re: Is there any possibility to process bunch of records with tJavaFlex

Hi shong,
You asked me that is it possible to modify function of jar to process records and write data to file, yes we can modify the jar. Then what are the changes we should do from external jar...
Community Manager

Re: Is there any possibility to process bunch of records with tJavaFlex

I meant that you can process the data and write the data in external function instead of returning the list of data and write the data with tFileOutputDelimited.
If you want to return the list of data and write the data with tFileOutputDelimited, you need to modify the java code on tJavaFlex like this:
#####Start code
List<String> list=new ArrayList<String>();
List<String> outputList=new Arraylist<String>();
#####Main Code
if(out1.ID%chunkSize== 0){
System.out.println("list is prepared");
ArrayList<String> AL= ExternalJarOperation(list);
for(String s:AL)
{
outputList.add(s);
}
list.clear();
System.out.println("******************");
}else{
list.add(out1.LogRowIn);
}
#####End Code
ArrayList<String> AL= ExternalJarOperation(list);
for(String s:AL)
{
outputList.add(s);
}
globalMap.put("outputList",outputList);
After tJavaFlex, use a tFixedFlowInput to get the output list and generate a input flow, eg:
....tJavaFlex--oncomponent--tFixedFlowInput--main--tJavaRow--main--tNormalizeRow--tFileOutputDelimited
on tFixedFlowInput: define one column called newColumn with String type, and set its value as:
 ((java.util.ArrayList)globalMap.get("list")).toString()
tJavaRow: remove the characters "":
output_row.newColumn = (input_row.newColumn.replaceAll("\\","");
tNormalize: normalize the input data with separator "," to become multiple lines, for example:
1,2,3
becomes
1
2
3
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Is there any possibility to process bunch of records with tJavaFlex

Hey Shong,
Thanks a lot for reply shong. Its really worked for me. I designed a job based on the flow you said.
This flow worked for a Job upto 10000K(~250Mb) file without any Java Heap/RAM issues and I set JVM heap min-4GB, Max-7GB at Talend-settings.
While running Big files (5 million to 1 billion - ~2GB) system hung due to huge memory allocation (RAM space issues) then system hung. I thought, this issue may be happened because of  managing all processed Lists in Global Map.
In next process, I tried with 1GB file, all records are processed with externalJar in tJavaFlex compoent but system Hung while processing records at tFixedFlowInput component. I think this issue may happen due to transporting GlobalMap to 'tFixedFlow' Column 
Shong, Can we do mitigate these memory issue for big files. Do i use any other component. 

 

Community Manager

Re: Is there any possibility to process bunch of records with tJavaFlex

Hi 
About Out of Memory issue, take a look at this article to see if it is possible to optimize the job design and allocate more memory to the job exeuction, in addition,  try to reduce the chunk size. 
----------------------------------------------------------
Talend | Data Agility for Modern Business