Talend Open Studio Tutorials


Tutorial 5: Filtering Data by Using the tMap Component



Filtering data by using the tMap component
In this tutorial, discover the tMap component and its interface, and learn how to use it to filter columns from a schema.

This tutorial uses Talend Open Studio Data Integration version 6


1.  Create a new Job, add the movies metadata as an input source, and add a tMap component

a.     Create a new Standard Job named tMapFilter

b.     Add the movies metadata file as input delimited component.

c.     Add a tMap component that can modify the schema and filter columns.

d.     Create a flow of data from the movies component to the tMap_1 component by linking the two components.


2.  Configure the tMap_1 component to filter columns


a.     Double-click the tMap_1 component.

The tMap_1 wizard window has four main sections.

  • Left Section displays the incoming data flows. Note that there can be multiple inputs into the tMap component.
  • Middle Section displays the mapping links between the input and output data flows. Here you can also create variables that use input values, and are then used to produce output.
  • Right Section displays the output data flows.
  • Bottom Section is the Schema editor that can be used to modify the schema of an input or output flow. To edit a Schema, select the input/output flow whose schema you want to change (the selected flow is highlighted in yellow) and edit the schema in the Schema editor.

b.     To create a new output component, in the output section of the tMap_1 wizard, click the [+] button, type the name of the output as filteredOutput, and click OK. An empty output is created.

c.     To add columns to the output, in the Schema editor of the output, click the [+] icon.

d.     Define a column for movie ID (Column: movieID, Type: Integer, and Length: 4). Note: The output column name need not be the same as the input column name. To change the column name, edit the entry in the Schema editor.

e.     To send the data from the movieID column of the input file to the output column, click movieID, hold, and drag to the Expression column of filteredOutput.

A yellow arrow appears indicating the flow of data.

f.      To add the title and releaseYear columns to the output component and link them, select and drag the columns from the input component to the output component.

g.     To change the order of the columns in the output component, click the [↑] or [↓] icons. The column order and the corresponding links will be updated.


3.  Use the configured tMap_1 component

a.     To display the output processed by the tMap_1 component, add a tLogRow component in the Job Designer and link the filteredOutput output of the tMap_1 component to the tLogRow_1 component. b.     To run the Job, in the Run view, click Run.

Only the filtered movie data (movieID, releaseYear, and title) is displayed.