Handling duplicate column names when using Dynamic Schema

Highlighted
Six Stars

Handling duplicate column names when using Dynamic Schema

Hi,

I have a basic job setup which loads the contents of a CSV file into a database table using the "DYNAMIC" column type.  This works great for quickly getting raw data in a database.

However, the latest file I am testing this on has duplicate column headers.  Is there a way of handling duplicate column names when using Dynamic Schema?  Maybe by adding a number to the end of the duplicate column or something along those lines?

 

This would be a basic example:

image.png

This would be imported as something like below to get away from the duplicate column:

image.png

Many thanks

Steve

Highlighted
Community Manager

Re: Handling duplicate column names when using Dynamic Schema

The column names will need to be different. If you can add something to make the column names unique, that will solve this issue.

Highlighted
Six Stars

Re: Handling duplicate column names when using Dynamic Schema

OK, thanks.  Seems it's not possible then.  I wanted to take away the manual aspect of having to check the column names as the CSV files I'm dealing with can have 100s of columns

Highlighted
Community Manager

Re: Handling duplicate column names when using Dynamic Schema

You could build a pre-processing job to deal with this scenario.

 

Unfortunately this is a problem that is quite unusual. Data with columns using the same name is not something that any integration tool will solve dynamically, natively and intelligently. If I have a dataset that I have not seen before and it has two columns called "name", I would have to spend a fair amount of time looking at the data to try and assess what each column was actually holding. As a human I have a "world knowledge" that might help me in doing this. 

 

A simple process to tokenize the header row, look for duplicates and then reassign duplicates with another name (maybe just adding a number to the existing name) should be relatively easy to achieve.

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 3

Read about some useful Context Variable ideas

Blog