Hello everyone, I have to make an etl and I stuck on a logic of transformation and maybe someone can help me. Simplifying a lot now my file is structured as below:
The logic is that if there are 2 or more rows that are sequential ( OUT corresponding with IN ), I have to aggregate, keeping the minimum IN and the maximum OUT. So the results has to be:
If there were just the first 3 rows, it would be easy using a tAggregate, but I don't know how to tell Talend that has to aggregate only sequential ones. (IN and OUT in reality are dates)
Basically, you should think of creating partition buckets. Once you have your partition buckets, you just do a min on IN and a max on OUT on the bucket group and you will have the answer.
Here is an example using integers. But you can use same logic for dates. I have done it with a tJavaFlex and some code as it is the fastest if your data is sorted on IN and is just like below.
Watch the recorded webinar!
Accelerate your data lake projects with an agile approach
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.