We have following requirement in our job. We have a batch of incoming rows. Suppose the batch size is 6 in which we have data like this: Key col1 col2 col3 A1 a - - A2 - b - A3 - - c A1 – b – A2 a - c A1 - - c I want to merge the data so that if value for particular field is null in an incoming record then it should be defaulted from previous row. So basically I want to merge current record with previous row (having same match key) and merge the values so that values from current record should survive and null fields should be defaulted with values from previous rows. All these rows are sorted on a numeric field which ensures the sequence of rows. For the above incoming data we want following output: A1 a - - A2 - b - A3 - - c A1 a b – A2 a b c A1 a b c Can someone please suggest if this is possible by using some Talend component or suggest any other alternative to achieve this in Talend job.
I am able to achieve this functionality partially using tMemorizeRows. The problem I am facing is tMemorize gives the previous row which does not contain the values from its previous rows. e.g. in the above example there are 3 instances of A1 key. When I process second instance of A1, I get reference to first using tMemorize which is ok. But when I process third row then I should get the second row updated with the values of first row. Every row should preserve its own values and the values defaulted from its previous row and should pass the cumulative values to its next rows.
Hi Chetan This can be done using tAggregateRow and some trickery using the List as the aggregation function. In your case you then want to use tMap to find the last non-null element in the list. There is a ListUtil sample routine showing you how to do this in the attached zip file.
Talend Team, Thanks for the solution and the sample job. Aggregation is not really solving our problem. In our case it is expected that the merge operation should retain the same number of input rows. So if there are 6 rows in the input then after merge we should get 6 rows in the output. The requirement is if value for a column is absent in the current row then it should copy it from previous row of the same “key”. Aggregation is providing distinct rows in the output. Can you please suggest alternative? Meanwhile we have come up with an alternative solution using tJavaFlex component. Basically we are using HashMap to store the previous rows. We check if previous row exists for current row key. If yes, then copy the values from previous row into current row else send the current row to output (as is) and store it in previous rows HashMap for future use. Attaching sample for your reference. Now we have following concerns: Is it advisable to use internal row structure (row1Struct) in our logic? We are not using “Data Auto Propagate” option in tJavaFlex but having input to output mapping in our code. This will need to manually update it if there is any change in the input or output structure. Is there a way to ease it?