Sometimes it is useful to be able to compare a column against the same column in a previous row within a dataset. A simple example of this requirement might be as follows. You have a collection of orders that are sorted by Order Number and Date. An Order Number is usually unique, however there can be amendments made. You are tasked with identifying original Orders and Amended Orders. Original Orders will always be the first Order record created. Amendment Orders are always created after the Original Order. The table below is an example of the dataset....
In order to identify the Amendment Orders from the Original Orders, we need to check the Order Number of the row before. If it is the same, then the current record is an Amendment Order, if it is different it is an Original Order.
We can do this in Talend quite simply using a tMap component and tMap variables. One of the most overlooked features of the tMap component is the fact that the tMap variables can store values between rows and they are processed in order (top down). This allows us to store values between rows and compare them with current rows. The screenshot below shows the layout of the tMap component. I will explain the expressions below the screenshot.
The variable expressions are shown and explained in the table below....
|order_type||Var.last_order!=null && Var.last_order.compareToIgnoreCase(row2.OrderNumber)==0|
? "Amendment Order" : "Original Order"
|This variable expression is tested first. First the "last_order" variable is checked to see if it is not null and to see if it matches the current "OrderNumber". Since the "last_order" variable has not been assigned yet for the first row, it will be null. For subsequent rows it will hold the value of the previous row. If the "OrderNumber" value matches the value assigned to "last_order" in the previous row, it means it is an "Amendment Order". Otherwise it is an "Original Order".|
|last_order||row2.OrderNumber||The "last_order" variable is checked first and assigned second due to the order of processing the variables.|
The result of the tMap processing the data in the table at the top is shown below.....
|[statistics] connecting to socket on port 4007|
Thank You, in addition to answering my basic question of how to refer a previous variable in the next variable, it also educates me on how to compare column values between current and previous rows using "compareToIgnoreCase"
This solution is only possible in Standard jobs.
For Spark job you would need to do something different.
Check out this video:
It explains a use case where the previous record needs to be considered in Spark.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend
Test drive Talend's enterprise products.