I am a new talend user. I have done some searches and I could not find exactly my question.
I am designing a parent job, and 2 subjobs. First, let me describe the parent job:
The parent job will read an input file, parse this data and then query an oracle database using this input. Then, we filter the rows, and proceed to subjob A or subjob B depending upon the filtering.
Subjob A and Subjob B perform different final actions, but the overall idea is the same:
Query additional data depending upon which subjob, and then perform a final action.
I need both subjob A and subjob B to be reusable such that they could potentially be called from different jobs.
Thus, I have arrived at these choices:
A.) Passing variables via context. I understand this is the suggested way to do it with the free version TOS.
B.) Having the parent jobs generate an output file and then have the subjobs read this output file.
C.) I cannot use option C, but I saw mention to Jobjet (joblets maybe?) which allow the output of a parent to be passed to the child job directly. As I understand it, this is for a licensed version, which I do not have. Thus I cannot consider this.
As I understand it, the advantages to using point 1 are that it would be faster, and this is the usual way I see this suggested on the forums.
The advantage to using option 2 is that it allows for subjob A and subjob B to be reused without having to pass the contexts each time.
Here are some questions:
1. Is there any way to manage this? If I use the input as contexts, I cannot reuse subjob A or subjob B without passing them the variables as contexts each time (is this assumption correct???). Additionally contexts will be very painful, as I will be passing upwards of 200+ variables for some subjobs.
2. Is there a way to quickly map contexts using java rows (the only way I know of is context.X = input_row.X, etc)?
3. How bad is the performance hit for creating an output, then having the subjob read that as an input? Using this would allow me to manage the reuse.
I appreciate any input and suggestions.
I'm not entirely sure what exactly you are asking here. I *think* your issue is with passing a recordset instead of individual values to a child job. With context variables, normally you are passing individual values. However, you can pass recordsets if you can use a bit of Java. You have the option of using an Object context type (actually a class). If you use Object, you can actually create a collection (ArrayList, HashMap, etc) of your own class (maybe created using the routines functionality) and pass that to your child job using context variables. So for example, you might have an ArrayList of objects which hold your data. Your objects may be instantiations of a class you have created called MyClass. You build that collection from your data in your parent. Then set a single Object context variable to the value of your ArrayList. Then in your job, you cast the value held by your Object context variable, to your expected Class and work with it within your child job.
ArrayList<routines.MyClass> tmpDataArray = (ArrayList<routines.MyClass>)context.myObjectContext;
The code above gives an example as to how you would retrieve your ArrayList object back from the Object context variable (casting). The next thing you might want to do is turn that into row data. That is easily achieved with a tJavaFlow. The code above would go in the Start Code section with the beginning of a while or for loop based on the tmpDataArray object. The Main Code would be where each row is generated and passed to the next component. The End Code would simply close the while or for loop.
I use this technique in this tutorial (https://www.rilhia.com/tutorials/talend-connect-example). Search for 1) "Convert input array to a datarow" (tJavaFlex) to see how I do it there.
I hope this helps
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.