Calling a user routine to lookup a file is taking a lot of time

Highlighted
Four Stars

Calling a user routine to lookup a file is taking a lot of time

Hi,
I am bit new to talend, I am trying to implement lookup functionality in talend.
I have created a Java routine which takes file name,searchcolumnindex,retrievecolumnindex,valuetolookup. This function returns the value in retrievecolumnindex. And I am calling this function in tmap to do lookup for multiple columns. This is making job run very slow say, 17sec per single record.
I have noticed this is due to the routine call, I/O on lookup file is done once per each input record.
Any suggestions on how to go about optimizing this function call, like read whole lookup file once for all main records rather than performing I/o for each record ??
Appreciate if you could help..

Accepted Solutions
Fifteen Stars TRF
Fifteen Stars

Re: Calling a user routine to lookup a file is taking a lot of time

Just declare the hashmap as a public static variable in your package routine.

Then, you will have 2 kinds of methods to use this hashmap:

 

  • 1 method to put values when the job starts 
  • 1 method to get values when desired during the job execution

 

Here is a whole example I use to translate values for some table fields avoiding to have many lookups in my jobs:

 

package routines;
import java.util.HashMap; public class myMapping { // HashMap to store source and target values for each public static HashMap<String, String> myHashmap; public static void setValue(String tableName, String fieldName, String sourceValue, String targetValue) { // Initialize HashMap if(myHashmap == null) myHashmap = new HashMap<>(); // Add entry for the key "tableName + "_" + fieldName + "_" + sourceValue" myHashmap.put(tableName + "_" + fieldName + "_" + sourceValue, targetValue); } public static String getTarget(String tableName, String fieldName, String sourceValue) throws Exception { // Get target value for entry "tableName + "_" + fieldName + "_" + sourceValue" return(myHashmap.get(tableName + "_" + fieldName + "_" + sourceValue)); } }

In my case the I have a mapping file where each record is composed of 4 fileds:

  • the object name
  • the field name
  • the source value
  • the target value

For example:

Fruit;Name;Banana;Banane
Fruit;Name;Apple;Pomme
Country;Label;United States;Etats-Unis
Country;Label;Germany;Allemagne

 You just have to adapt this to your own use case.


TRF

All Replies
Employee

Re: Calling a user routine to lookup a file is taking a lot of time

Hi,

 

    The performance roadblock is in user routine. Could you please give more background about what you are trying to achieve with routine. I understood that you are picking the index of the columns but could you please advise why you are using this operation?

 

    Did you try to achieve the business logic of routine through Talend components? Also screenshot of the job flow and code of user routine will be helpful as it will hep to understand the flow better.

 

Warm Regards,

 

Nikhil Thampi


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)
Four Stars

Re: Calling a user routine to lookup a file is taking a lot of time

Hi Nikhil,

 

Thank you for the reply. We have scenario where we have to fetch value from lookup file based on a criteria (No joining key). For Example,

 

My main input flow file lets say 'ABC' have columns U,V,W

I have another file(lookup file) say 'DEF' say columns .X,Y,Z

My logic should check like if(X=="APPLE") then get 'Y' value and populate to "V" else populate 'Null'

 

This is one level of lookup, similar to this we have multi level lookups. To achieve this we have defined below routine

 

public static String lookupFetch(String file_name,int searchColumnIndex,int retrieveColumnIndex, String searchString) throws IOException 
 {
    String resultRow = null;
    BufferedReader br = new BufferedReader(new FileReader(file_name));
    String line;
    while ( (line = br.readLine()) != null ) {
        String[] values = line.split(",");
        if(values[searchColumnIndex].equals(searchString)) {
            resultRow = values[retrieveColumnIndex];
            break;
        }
    }
    br.close();
    return resultRow;
    }

 

we have used this routine in many input columns(20 columns out of 50 input columns) transformations. We have used above function in nested form as well. Now the job is taking on an average of 17secs for each input record.

 

Request your further inputs.

Four Stars

Re: Calling a user routine to lookup a file is taking a lot of time

Just to add more info, we are using 4-5 lookup files and none of them more than size of 3-4 MB.
Employee

Re: Calling a user routine to lookup a file is taking a lot of time

Hi,

 

    The file reading and parsing must be the reasons for longer processing duration. Could you please provide a sample main file, lookup files and expected output and I will try to build some logic around it without calling routine.

 

Warm Regards,

 

Nikhil Thampi


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)
Four Stars

Re: Calling a user routine to lookup a file is taking a lot of time

HI Nikhil,

 

Below is how the sample job is. I have attached the lookup files that i am using in the transformations

 

I/pSmiley SadT_Generate row)

with one column namely 'input_value' with value as "1"

 

Current Transformations in t_map:

input_value : row1.input_value
cl_scm_id: TLookup.lookupFetch("cl_scm.txt",0,1,"CDE")
cl_id: TLookup.lookupFetch("cl_id.txt",3,0,TLookup.lookupFetch("cl_scm.txt",0,1,"ABC"))


o/pSmiley Sadt_log_row)

.-----------+---------+-------.
| tLogRow_1 |
|=----------+---------+------=|
|input_value|cl_scm_id|cl_id |
|=----------+---------+------=|
|1 |245 |1230001|
'-----------+---------+-------'

 

Four Stars

Re: Calling a user routine to lookup a file is taking a lot of time

Hi Nikhil,

Have you got any other solution to avoid routine call ? If so can you please share it will be helpful..
Fifteen Stars TRF
Fifteen Stars

Re: Calling a user routine to lookup a file is taking a lot of time

You should think to load the file once to a hashmap or other in memory structure to keep these data available for the job. These task should be achieve by the tPreJob subjob.
Then, your routine will not have to read the file anymore, just have to search for the required values by key or by scanning an array or something like that. Should be a solution.

TRF
Four Stars

Re: Calling a user routine to lookup a file is taking a lot of time

Hi TRF,

Thank you for the reply, but if I store the file in hashmap using prejob, how can I refer to that hash memory in my routine ?? Can you please throw some more light on it or give me an example.. I am sorry if it is obvious, I am bit new to talend..
Fifteen Stars TRF
Fifteen Stars

Re: Calling a user routine to lookup a file is taking a lot of time

Just declare the hashmap as a public static variable in your package routine.

Then, you will have 2 kinds of methods to use this hashmap:

 

  • 1 method to put values when the job starts 
  • 1 method to get values when desired during the job execution

 

Here is a whole example I use to translate values for some table fields avoiding to have many lookups in my jobs:

 

package routines;
import java.util.HashMap; public class myMapping { // HashMap to store source and target values for each public static HashMap<String, String> myHashmap; public static void setValue(String tableName, String fieldName, String sourceValue, String targetValue) { // Initialize HashMap if(myHashmap == null) myHashmap = new HashMap<>(); // Add entry for the key "tableName + "_" + fieldName + "_" + sourceValue" myHashmap.put(tableName + "_" + fieldName + "_" + sourceValue, targetValue); } public static String getTarget(String tableName, String fieldName, String sourceValue) throws Exception { // Get target value for entry "tableName + "_" + fieldName + "_" + sourceValue" return(myHashmap.get(tableName + "_" + fieldName + "_" + sourceValue)); } }

In my case the I have a mapping file where each record is composed of 4 fileds:

  • the object name
  • the field name
  • the source value
  • the target value

For example:

Fruit;Name;Banana;Banane
Fruit;Name;Apple;Pomme
Country;Label;United States;Etats-Unis
Country;Label;Germany;Allemagne

 You just have to adapt this to your own use case.


TRF

Cloud Free Trial

Try Talend Cloud free for 30 days.

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.