Update database from CSV files when each tables have different keys

Highlighted
Eight Stars

Update database from CSV files when each tables have different keys

Hi there,

 

I'm trying to update database with multiple tables (100+) from CSV files (in a directory) using the following job design and components.

inputinput

outputoutput

The issue here is that all my tables (in a DB) have different schemas and different primary keys (most of my PK is a composite key made from multiple PK columns, usually from 1-5). 

For example:

mysql_pkmysql_pkmysql_pk1mysql_pk1

So I'm not sure how I could dynamically set my PK in schema.

Does anyone have an idea how to make this work?

A screenshot of job design/components would be very much appreciated Smiley Happy

 

Currently, I only have a dynamic column in my schema since all my keys are different.

tMaptMap

 

When I tried to to run the job, it stated that 'schema must have a key'.

 

Exception in component tDBOutput_2 (test02)
java.lang.RuntimeException: For update, Schema must have a key

 

 

 

Highlighted
Nine Stars

Re: Update database from CSV files when each tables have different keys

Hello,

 

I'm not sure you'll be able to achieve this without using some tricks.

 

I think you should first load csv data into "loading tables" (table you'll truncate and insert only) and then update your final tables from those data.

 

You can either write manually all the queries or you can build them on the fly from the MySQL catalog.

 

Regards

Highlighted
Eight Stars

Re: Update database from CSV files when each tables have different keys

@lennelei
Thank you for your response.
Can you explain more on the job design/components used? or maybe a job design screenshot??
**I'm totally new on Talend and I'm not good in database as well ....

Thank you so much
Highlighted
Eight Stars

Re: Update database from CSV files when each tables have different keys


@crotmn wrote:

Hi there,

 

I'm trying to update database with multiple tables (100+) from CSV files (in a directory) using the following job design and components.

inputinput

outputoutput

The issue here is that all my tables (in a DB) have different schemas and different primary keys (most of my PK is a composite key made from multiple PK columns, usually from 1-5). 

For example:

mysql_pkmysql_pkmysql_pk1mysql_pk1

So I'm not sure how I could dynamically set my PK in schema.

Does anyone have an idea how to make this work?

A screenshot of job design/components would be very much appreciated Smiley Happy

 

Currently, I only have a dynamic column in my schema since all my keys are different.

tMaptMap

 

When I tried to to run the job, it stated that 'schema must have a key'.

 

Exception in component tDBOutput_2 (test02)
java.lang.RuntimeException: For update, Schema must have a key

 

 

 


Right now, I'm trying a lengthy method below.

 

Job1: Load CSV and insert into a temporary database(dynamic schema)

Job2: Inject constraint into the tables (temporary DB)

Job3: Update tables (temporary DB)'s data into the real database

 

If there's a precise and better way to do this, please let me know.

 

Thank you

Highlighted
Nine Stars

Re: Update database from CSV files when each tables have different keys

Hi,

 

first load all the data into empty tables used only as intermediate storage either with Talend (easy with Dynamic schema) or directly from the database:

Talend BulkExec components or MSSQL BULK INSERT, ORACLE SQL LOADER, etc.

then, update final tables using intermediate tables ; two solutions :

 

1) create as many update_[tableName].sql files as needed ; in each file, manually enter the required update query such as :

/*update_tableTarget1.sql*/
UPDATE tableTarget1 t, tableSource1 s SET t.data1 = s.data1, t.data2 = s.data2 WHERE t.id = s.id

Then, within your Talend job, loop over the SQL files and use a tDatabaseRow to execute the update query (you can even parallelize the iterate link after the tFileList) :

tFileList (*.sql) -iterate-> tFileInputFullRow -row-> tFlowToITerate -iterate-> t<DataBase>Row

the tFileInput* reads the SQL file and the tFlowToIterate put the query in globalMap but you can use others methods as well.

 

Or

2) build dynamically the queries from the database catalog.

You can get a list of tables to update (this heavily depends on your database) :

select table_name, ... from <system_tables> where ...

You then loop over the tables, retrieve the list of columns, identify which are keys and so one and build all the queries dynamically and use a tDatabaseRow to do all the updates.

 

The first method is easier but harder to maintain. The second method will ask more work at first but then, you'll be able to add tables, change formats much faster.

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Talend Cloud Developer Series – Updating Context Variables

This video will show you how to add context parameters to a job in Talend Cloud

Watch Now

Talend Cloud Developer Series – Deploying First Job to Cloud

This video will show you how to run a job in Studio and then publish that job to Talend Cloud

Watch Now

Talend Cloud Developer Series – Fetching Studio License

This video will help someone new to using Talend Studio get started by connecting to Talend Cloud and fetching the Studio License

Watch Now