I have a requirement where I'm extracting the data from a excel file and loading that to target oracle table. Below is how my data looks like.
If there are multiple records per Prospect_ID and if the APPLICANT_SUBMIT_DATE is null, we need to load to target the record where Decision = Y and if more than one row exists with Decision = Y then take the record with maximum Application_Start_Date.
Hello, have you tried the following?
tExcelInput -> tFilterRow (filter Decision = Y) -> tAggregateSortRow (Group by Prospect_ID, Entry Team) -> tLogRow
You extract all the information from Excel, filter and delete Decision = N is not relevant information).
By grouping for each Prospect ID and Entry Team you will have each row with a different application date and decision = Y which is what you need to send to the output to save it in a CSV / Database System file.
Thank you for the reply.
I do not want to filter all the data with Decision = N. If there is a single row where Decision=N, that row should be loaded to target.
I should filter Decision=N only when there are multiple records per prospect ID per entry_term, and if the APPLICATION_SUBMIT_DATE is null for those records with same prospect_id, then remove the row or rows with Decision = N, and out of the remaining rows with Decision=Y get the row with MAX Application_Start_Date and load that in to target.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Part 2 of a series on Context Variables
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema