We have a situation where we have multiple columns say Employee Id , Employee Name,Department,Vertical,Floor etc.
What we want that that instead of Department,Vertical,Floor etc. we have id's for the same and the details of these fields will in a different table.
Through which component we can assign id's to Department,Vertical,Floor etc.?
1 , Test1 , HR , BFS , 2nd Floor
2 , Test2 , Finance , Healthcare , 3rd Floor
We want like
1 , Test1 , 1 , 1 , 2
2 , Test2 , 2 , 3 , 3
and different tables for the 3 fields .For reference as below :
Department Id Department Name
Here is my proposal for your case:
I replace CSV file by tFixedFlowInput for the test with fixed values.
Here is the configuration for this component:
1st step, I used a tMap to separate the values for fields Department, Vertical and Floor:
After the values has been separated, I get unique values using the tUniqRow and then and a sequence number for each values of each field and the result is stored into 3 separated tables (tHashOutput). Here is the first tMap, other components for this step are quite simples:
Be careful for the 2 others tMap to change the sequence name (for example use "v" for Vertical and "f" for "Floor").
Now we got our 3 separated tables, so we can start with the 2nd subjob to replace names by id for the 3 fields.
For that I use again the same tFixedFlowInput as for 1st subjob (change for a delimited file if you prefer).
The content goes to tMap as the main flow and the 3 tables (tHashInput) are used as lookups.
Here is the tMap with 3 inner join based on the names to get the ids:
You just have the 1st inner join on the capture but the 2 others are the same (except they use Vertical and Floor field instead of Department).
Be careful to declare all ids as integer (as they come from the sequences).
Finally, the result goes to the console using tLogRow:
Hope this helps.