I'm using the excellent tFileExcelSheetInput component but I'm having an issue. It is not with the component itself but with the incoming data.
tFileExcelSheetInput allows me to 'search' for column names across the top row of each sheet, and then return the data under that column heading into my flow (which is great).
My issue is that the incoming spreadsheet can contain duplicate column names. tFileExcelSheetInput appears to be taking the right hand most column with that heading and returning that data. As I said, it isn't really the component I have an issue with it is the duplicate column names.. The duplicated column names have different data below.
So I can have column headings something like:
The column positions can change from sheet to sheet - hence why I am using tFileExcelSheetInput.
As you can see above, the Column name 'Name' is repeated.
tFileExcelSheetInput will return the second 'Name' data (applies to a house name here and is hardly ever populated) rather than the persons 'Name' data.
I am getting around this by reading the top row, pivoting it, adding a numeric counter (to maintain its position) then doing a tUniqRow to identify duplicates, then renaming the duplicate with a 1,2,3etc re-forming the column, order it on the numeric counter to get the column positions correct, and then unpivot back to a spreadsheet, where I then append the data from the original worksheet.
In this example I end up with column names:
This is a very roundabout process.
Does anyone know of a better way of dealing with the duplicate column names?
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema
Pick up some tips and tricks with Context Variables