In Talend we use a tool called Tunite to union two or more data-sets. The problem with it is the schema of all the data-sets should be same, where the order of the columns and the number of columns should always be same. When handling a larger data-set, ordering the data-sets or creating dummy columns becomes difficult as there are many number of columns and many data-sets.
Is there any way where i can just unite them automatically based on the column names? and if the column is missing in other data-set it should have nulls instead.
If it i a custom component , can anyone guide me towards the existing component or help me build one/
A way I've tackled this in the past is load the unknown schema data-set up into a table of a suitably large enough number of string columns. Then using the header line of the incoming data, dynamically construct some SQL to select back out all columns, replacing them with nulls if they weren't in the source file.
It's not pretty, but it does work.
1/ create table in DB that's the super-set of all columns in all sources (final table).
2/ create table in DB that's got X columns, where X is a number which is the largest number of columns you'd expect in any incoming data-set, each column will be varchar(8000), just call the columns col1, col2, col3.....colX (work table)
3/ load incoming data into DB work table
4/ parse the header row of the incoming data, to ascertain the position of all columns of interest - this will be critical that the same column data is called the same name in each of the incoming data-sets.
5/ construct SQL to INSERT INTO/SELECT from the work table into the final table.
6/ execute the SQL
7/clear down work table.
Thanks for the instruction. I tried do this but I did not succeed. Most likely, I just do not understand how to do it. For me, it's as hard as writing a term paper. I could not write a term paper because I do not know how to do it. I found writing help a term paper on the internet and helped me. Probably I will have to look for a programmer who will help me to do everything according to your instructions, so that the information is displayed correctly through the database.
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.