In this tutorial, learn how to read data from a simple delimited file.
This tutorial uses Talend Open Studio Data Integration version 6
1. Create a New Job
- Ensure that the Integration perspective is selected.
- In the Project Repository, right-click Job Designs and click Create Standard Job in the menu.
- In the Name field of the New Job wizard, fill in the name of the Job as readCSVFile.
- It is good practice to add a purpose and a description to a Job. Then, click Finish to create your Job.
The Job Designer opens an empty Job.
2. Add a tFileInputDelimited component
3. Configure the tFileInputDelimited_1 component
- In the Job Designer, click the tFileInputDelimited_1
- To define the Basic settings for the component, in the Component view, click the Component
- Property Type defines how you will read the data source.
- File Name/Stream shows the complete input or output file path. You can either type the path manually or use the ellipsis button [..] to provide the file path.
- Row and Field Separators define the type of row separator.
- Header and Footer indicate the number of rows in the file that should be ignored.
- Limit shows the maximum number of lines to read in the file.
- Schema defines the data structure of the file.
- To specify the path and name of the file to be read, click [...] next to the File Name field, select the file from the local disk, and click Open.
4. Define the schema for the tFileInputDelimited_1 component
- To define the schema for the tFileInputDelimited_1 component, click [...] next to the Edit schema field.
The Schema of the tFileInputDelimited_1 wizard opens.
- [+] button adds a column to the schema wizard.
- [x] button removes the selected items from the schema wizard.
- [↑] and [↓] buttons move selected items up or down in the schema wizard.
- In the Schema wizard, click the [+] icon to add a column.
- In the Column column, enter the field name as movieID.
- To designate this field as the key, select the Key
- In the Type column, click Integer.
- Ensure that the Nullable column is unchecked, so that any null value for this column is rejected.
- In the Length column, enter 4.
- Repeat steps b to g for each field in the CSV file.
- To close the Schema wizard, click OK.
5. Add the logging component and propagate the data
- Add a tLogRow component to the Job. The tLogRow component will display in the console all the rows of data it receives.
- To propagate data from the tFileInputDelimited_1 component to the tLogRow_1 component, in the Job Designer, right-click tFileInputDelimited_1, hold, and drag to tLogRow_1.
Alternative method: To link the components, you can also right-click the source component and click Row > Main.
6. Run the Job
- In the Run view for the Job readCSVFile, click Run.
The file was read by the tFileInputDelimited component, and its content was displayed on the console by the tLogRow component.