I have a requirement where I'm extracting the data from a excel file and loading that to target oracle table. Below is how my data looks like.
If there are multiple records per Prospect_ID and if the APPLICANT_SUBMIT_DATE is null, we need to load to target the record where Decision = Y and if more than one row exists with Decision = Y then take the record with maximum Application_Start_Date.
Hello, have you tried the following?
tExcelInput -> tFilterRow (filter Decision = Y) -> tAggregateSortRow (Group by Prospect_ID, Entry Team) -> tLogRow
You extract all the information from Excel, filter and delete Decision = N is not relevant information).
By grouping for each Prospect ID and Entry Team you will have each row with a different application date and decision = Y which is what you need to send to the output to save it in a CSV / Database System file.
Thank you for the reply.
I do not want to filter all the data with Decision = N. If there is a single row where Decision=N, that row should be loaded to target.
I should filter Decision=N only when there are multiple records per prospect ID per entry_term, and if the APPLICATION_SUBMIT_DATE is null for those records with same prospect_id, then remove the row or rows with Decision = N, and out of the remaining rows with Decision=Y get the row with MAX Application_Start_Date and load that in to target.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema
Pick up some tips and tricks with Context Variables