Lookups with multiple column matches and range

One Star

Lookups with multiple column matches and range

Hello,
I am new to Talend and while using this forum has been extremely helpful in trying to resolve issues I have, there are still some issues I hope I will be able to get some help with.
I am using Talend to process a table with a Countries purchases of goods at a certain time and map it to another table that outputs income level of country based on the time ordered. Basically my tables look like this, but at a larger scale (33000 for input and 6000 for the mapping table):
Input
Country    Purchase order date 
   A            01/08/2010
   B            30/04/2013
   C            13/07/2013
   D            15/07/2014
   A            24/10/2011

Lookup table
Country     Start Date      End Date       Income Level
   A          01/07/2010    30/06/2011       high
   A          01/07/2011    30/06/2012       upper middle
   B          01/07/2011    30/06/2012       
   B          01/07/2012    30/06/2013       low
   C          01/07/2013    01/07/2014       lower middle
   D          01/07/2013    01/07/2014       middle
   D          01/07/2014    01/07/2015       upper middle

Output
Country  Purchase order date    Income Level
   A            01/08/2010              high
   B            30/04/2013              low
   C            13/07/2013              lower middle
   D            15/07/2014              upper middle
   A            24/10/2011              upper middle

I have tried using the tMap component and filtering by matching "Country" as the lookup and using the expression 
row.Purchase_Order_Date.after(row2.StartDate)
&& row.Purchase_Order_Date.before(row2.EndDate)
When I do so I get the following error:
Exception in component tMap_2
java.lang.NullPointerException
at java.util.Date.getMillisOf(Unknown Source)
at java.util.Date.after(Unknown Source)
I have looked up this error and read many forum posts as to what this error means. I tried the tIntervalMatch component but either I did it wrong or it is not possible to find an interval with a dependent match on a 3rd column. I found one solution where it was suggested that I filter along the lines of
row2.StartDate != null && row.Purchase_Order_Date.after(row2.StartDate)
&& row2.EndDate != null && row.Purchase_Order_Date.before(row2.EndDate)
 
This allows the program to run but does not give a correct output and either gives a incorrect values for income or reports "null" when there is definitely a value corresponding to it. Also there are no "null" values for any of the dates, however in some cases there is no value for income level because information was not available for that year.
I feel like the problem lies within having to match the "Country" columns and then looking up up "Purchase Order Date" between the columns "Start Date" and "End Date". I once had this problem with an Excel project that I was working on in which I devised a solution to match the start and end points of the range within the larger range and used "VLOOKUP" using the "INDEX" function to define the range to lookup in. Essentially creating a lookup within a lookup. But I am unsure how to do this in Talend, partly due to my skills in Java, but I am hoping someone can offer me some guidance in this matter. Whether using this potential solution or another.
Thank you very much!
Seventeen Stars

Re: Lookups with multiple column matches and range

hi,
I guess that your null pointer exception is due to some whiteSpace in your data and try to join something like "  D  " with "D     " !
Trim your field before the join.
I've done the job in 2 step.
first : tmap to a left outer join (all matches) to get all possibility
second : a filter to keep only lines where purchase order is between the start date & end date;


result
Starting job betweenDate at 11:23 30/03/2015.
connecting to socket on port 3424
connected
.--------------------------------.
|         #1. tLogRow_1          |
+-------------------+------------+
| key               | value      |
+-------------------+------------+
| Country           | A          |
| Purchaseorderdate | 01/08/2010 |
| StartDate         | 01/07/2010 |
| EndDate           | 30/06/2011 |
| IncomeLevel       |      high  |
+-------------------+------------+
.--------------------------------.
|         #2. tLogRow_1          |
+-------------------+------------+
| key               | value      |
+-------------------+------------+
| Country           | B          |
| Purchaseorderdate | 30/04/2013 |
| StartDate         | 01/07/2012 |
| EndDate           | 30/06/2013 |
| IncomeLevel       |    low     |
+-------------------+------------+
.-------------------------------------.
|            #3. tLogRow_1            |
+-------------------+-----------------+
| key               | value           |
+-------------------+-----------------+
| Country           | C               |
| Purchaseorderdate | 13/07/2013      |
| StartDate         | 01/07/2013      |
| EndDate           | 01/07/2014      |
| IncomeLevel       |    lower middle |
+-------------------+-----------------+
.-------------------------------------.
|            #4. tLogRow_1            |
+-------------------+-----------------+
| key               | value           |
+-------------------+-----------------+
| Country           | D               |
| Purchaseorderdate | 15/07/2014      |
| StartDate         | 01/07/2014      |
| EndDate           | 01/07/2015      |
| IncomeLevel       |    upper middle |
+-------------------+-----------------+
.------------------------------------.
|           #5. tLogRow_1            |
+-------------------+----------------+
| key               | value          |
+-------------------+----------------+
| Country           | A              |
| Purchaseorderdate | 24/10/2011     |
| StartDate         | 01/07/2011     |
| EndDate           | 30/06/2012     |
| IncomeLevel       |   upper middle |
+-------------------+----------------+
disconnected
Job betweenDate ended at 11:23 30/03/2015.
hope it helps
regards
laurent
One Star

Re: Lookups with multiple column matches and range

Hi,
Thank you for getting back to me so quickly. By habit I always state to trim all columns and I have checked them too. There seems to be no discrepancy so I do not think that is the problem. I am a little unsure of what you mean though, would you be kind enough to explain your first step? I am just unsure about the workflow and where I should place the filter. Are you suggesting that I use tMap function to map using no filter conditions whatsoever and then apply a tRowFilter to the output before tLogRow as such: (EDIT: I apologize, but it seems I am unable to upload a screenshot. Below is a workplan similiar to what I have in my Job)
InputFile ---------->tMap--------------->tFilterRow------------->tLogRow
                              ^
                              |
                              |
                              |
                              |
                         Lookup_table
If this is the case then how is the filtering working? I think I am just unsure as to where to use filter.
Thanks again for your help!
Usamah
Seventeen Stars

Re: Lookups with multiple column matches and range

here's some more screenshoot


and to have same output, can use a filter column after that :

One Star

Re: Lookups with multiple column matches and range

Thanks for your response! This works perfectly with the example set I gave you. Unfortunately I am having another issue that I did not forsee until I tried this method.
When I get to the point of filtering it out there are some cases where "Purchase Order Date" exists, however there are no values for Start and End date within the dataset. That is when I believe it returns the "Null point Exception" error. I am searching for a solution and should probably make another post dedicated to that, but if you can suggest a way in which I could set a default value for both Start and End Date after they have been mapped I would greatly appreciate it.
Thank you again for your help
Seventeen Stars

Re: Lookups with multiple column matches and range

Purchase Order Date" exists, however there are no values for Start and End date within the dataset

well, ask to business what you have to do with it ! Smiley Wink
escape from the flow ? default value ? retrieve those cases in a "reject flow" do threat them later ?
Employee

Re: Lookups with multiple column matches and range

You can use variables in your tMap to set default dates. Create a variable for StartDate and one for EndDate. In the variables set the expression to something like below.....
row1.Start_Date!=null ? row1.Start_Date : routines.TalendDate.parseDate("yyyy-MM-dd", "1970-01-01")
This will use the date it receives if it is not null, but will replace it with a default of 01/01/1970 if it is not present. 
One Star

Re: Lookups with multiple column matches and range

Thank you all for being so helpful! This worked out great. I hope to be able to return the favor to others on the forum someday.