How to process changing data structure?

Overview

This article introduces the dynamic schema feature. Dynamic schema allows you to design Jobs with an unknown column structure (unknown name and number of columns). If necessary, dynamic columns can be mapped directly to the target using Pass-through mode.

 

The main application of this functionality is a replication scenario or a simple one-to-one mapping of many columns. This feature makes designing these types of Jobs easy. For example, if you need to migrate a whole database with hundreds of tables, you can do so without explicitly including the table structure, using a single Job.

 

This article demonstrates the dynamic schema feature through two examples:

  • Read a table with dynamic schema
  • Read a file with dynamic schema

These examples are available in the attached Zip file.

 

Environment

This procedure is compatible with all versions of Talend subscription products.

 

List of components that support the dynamic schema feature

Not every component supports the dynamic schema feature. Refer to the article Which components provide the Dynamic Schema feature? to know which components provide the dynamic schema feature.

 

Procedure

Read a table with a dynamic schema

This example will extract data from a table called person. The table has many columns and contains the following example data. The objective is to write this data to a file.

id name gender email department entry_date
1 Jeffe Boy jeffe@company.com R&D 2012-05-15
2 Shong Boy shong@company.com Community 2011-08-07
3 Johnson Girl johnson@company.com HR 2014-04-16
4 Joyce Girl joyce@company.com Test 2012-09-10
5 Truman Boy truman@company.com Marketing 2013-02-27

 

In Talend Open Studio for Data Integration, the community version, you have to define each column in the schema to map the real table structure. If the table structure is unknown, it is impossible to read the table.

 

In Talend subscription products, there is a Dynamic type in the Type list, as seen in the following figure:

dynamic_schema.png

 

Using this type, you will need to define only one column with the Dynamic type, instead of defining all columns.

 

Example Job

Create an example Job called ReadTableDataWithDynamicSchema. The Job contains two components:

tMysqlInput_1: read person data from the database.

tFileOutputDelimited_1: write the data to a text file with a header.

 

The detailed Job settings can be seen in the following figures:

job_design_example1.png

 

tMysqlInput_1 settings:

tmysqlInput_1_example1.png

 

tMysqlInput_1 schema:

tmysqlInput_1_schema_example1.png

 

tFileOutputDelimited_1 settings:

tFileOutputDelimited_Settings_example1.png

 

Execute the Job

Execute the Job and open the generated file (in this case, it is D:/file/out.txt). Verify that all data is written to the file with the correct column names. The data shows as follows:

id;name;gender;email;department;entry_date
1;Jeffe;Boy;jeffe@company.com;R&D;15-05-2012
2;shong;box;shong@company.com;Community;07-08-2011
3;Johnson;Girl;johnson@company.com;HR;16-04-2014
4;Joyce;Girl;joyce@company.com;Test;10-09-2012
5;Truman;Boy;Truman@company.com;Marketing;27-02-2013

 

Read a file with dynamic schema

Here is another example to explain how to read data from a file with the dynamic schema feature. In this example, you will read the file that was generated by the above scenario.

 

Note: When reading data from a file with a dynamic schema, the first line of the file must be the header that will be read as the column names at run time. In this example, the first line of the file is the header.

id;name;gender;email;department;entry_date

 

Example Job

Create an example Job called ReadFileDataWithDynamicSchema. The Job contains two components:

tFileInputDelimited_1: read data from the text file.

tMysqlOutput_1: create a table in database and insert the data.

job_design_example2.png

 

tFileInputDelimited_1 settings:

tFileInputDelimited_settings.png

 

tMysqlOutput_1 settings:

tmysqlOutput_settings.png

 

Execute the Job

Execute the Job. Verify that a table called person1 was created in the database and the data was inserted.

 result.png

Version history
Revision #:
5 of 5
Last update:
‎06-22-2017 12:04 PM
Updated by:
 
Labels (1)
Tags (1)
Comments
sanjana

The above feature if we have to implement for multiple tables and if i have to launch the same job instances fr multiple tables ,will it work?

 

Tfiledelimitedinput-->tflowtoiterate-->iterate --Trunjob (which have the job which uses the DB connection and loading to file).Will this flow run one by one or Trunjob instance willbe fired in parellel..

 

Th sis just not to run the table load one by one ,i just want to kick off the jobs in parallel ..as i will have 100 tables to be loaded.

agershenson

Hi sanjana,

Our Community Discussion forums are monitored more closely than the Shared Knowledge areas, so you might want to ask your question on the appropriate forum (perhaps https://community.talend.com/t5/Design-and-Development/bd-p/integrating?)

If you need technical support, please log in to our Support portal at https://www.talend.com/services/technical-support/.

Hope that helps,

Alyce