In this tutorial, discover how metadata can help save a lot of development time, and learn how to create and use them.
This tutorial uses Talend Open Studio Data Integration version 6
Talend Open Studio allows you to create and run Java ETL programs or Jobs using pre-defined components.
Each component can be configured either as a “Built-in” or as a “Repository” component.
For “Built-in” components, information such as how to read the file and what it contains,
For “Repository” components, information
1. Create a metadata definition for a delimited file
The file is displayed in the File Viewer section of the wizard.
In the wizard window that appears, you can define settings such as how the file should be read, the number of rows, if any, that should be skipped when reading the file, and the maximum number of rows to process.
Note that when you do so, the Header checkbox is automatically checked with the value 1.
If the first line of the sample file includes column names, they will be displayed. If not, the columns will appear as Column 0, Column 1, and so on and will have to be renamed manually.
When guessing the schema, Talend only reads the fifty first lines of the sample file and based on the data in these rows, defines the column types and length. You should validate the information displayed or correct it, if necessary.
Under Metadata in the Project Repository, the movies 0.1 entry is displayed with the file properties. Under the entry movies 0.1, the schema of the metadata file, moviesSchema, is displayed.
If you need to modify the property type or the schema, right-click on the component in the Project Repository and select Edit File Delimited or Edit Schema.
2. Use the metadata to configure a component
Note: By default, the component is configured with “Built-in” parameters.
Note that the parameters set of the metadata is displayed. Also note that all the fields are in grey, indicating that they belong to the metadata and not to the component.
To change the schema, click […] next to text Edit schema and choose an option:
3. Use the metadata to configure a second component
Talend allows you to create metadata based on several parameters such as databases, SAP connections, and several file types.
Note: To illustrate this, MySQL Workbench 6.3 CE along with a test dataset called talend_dq is used. You can either try it with a similar configuration or with your own databases.
4. Create a database connection and define it as metadata
The connected database is displayed.
The database and all tables and details are displayed.
All table schemas have been imported as metadata and can be used.
The tables and the views appear under the mysql 0.1 connection in the Project Repository. To view the field in a table, click the table.
5. Read a database table using the metadata
A tMysqlInput component is created with the repository information. It used the MySql 0.1 connection, and for the Schema it used the repository information from the metadata table tdq_values.
In addition, Talend generates the SQL query and sends it to the table tdq_values.
The data from the table tdq_values is displayed.