Data validaton help - 50 columns all use the same lookup?

Four Stars

Data validaton help - 50 columns all use the same lookup?

Hi all,


I'm trying to design a process to validate the data I receive in a file and have run into an issue. Wondering if anyone else has run into something similar, and what the best solution was?


Sample columns:

FirstName   LastName   Age   Title1   Title2   Title3   Title4   Title5  ... Title50


The design of the file is not my choice. My requirement is as follows: "for any value received in columns Title1 through Title50, lookup that value in the dbo.Title table. If the value exists in the table, then the record is valid. If the value does not exist in  the table, the record is invalid."


Obviously, if this were just for one or two columns, I would just use a tmap to do a lookup to a SQL connection, using the inner join functionality to validate whether the value exists. However, to do this for all 50 Titles would be tedious as I think I would have to do 50 separate lookups (one with a join on Title1, the next with a join on Title2, the next with a join on Title3, etc). Since the same rule applies to all 50 columns (i.e. does the value exist in the Title table), I'm hoping there's a better way to only do the lookup once but compare multiple columns to it. 

Sixteen Stars

Re: Data validaton help - 50 columns all use the same lookup?

I have had this very same problem and there is a nice solution....but you will need to know a bit of Java. Essentially the solution I used is to load the lookup values with keys (I used column names and job number number for my "keys") into a database table. I then wrote a Java class to hold those values and used a tJavaFlex to read those values into objects instantiated from my class and hold them in an array. The class I wrote was part of a routine which had methods to search for relevant values (by key and value). Once I had the data loaded into memory (using the JavaFlex mentioend above) I could use the routine methods in my tMap to search in memory for relevant keys/values and return an appropriate value just using the tMap expression.


Now this isn't easy if you do not know Java, but is incredibly powerful if you do. It allows you to have lookups tied to each column without having to have hundreds of lookups connected to your tMap. It is also very easy to maintain.


Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.