This article shows you how to build a Talend Job to load data from flat files to corresponding target tables in Oracle.
Sources for the project are available in the attached Zip files.
Create a mapper table in the Oracle DB that specifies the mapping information of the filename and the corresponding target table name; or use the mapper.sql file, located in the SourceFiles.zip file that is attached to this article.
Create target tables in the Oracle DB. For more information on creating tables, see the Oracle Creating Tables documentation page.
The Job is built using two Jobs:
The Dynamic_Data_Ingestion.zip file, attached to this article, contains both the parent and child Jobs. Download the file, then import it into Talend Studio by performing the following steps:
Double-click the LoadDynamicSchemaParent Job to open it in the Designer.
Open the Contexts view and set the value of the directory context variable to the directory where the source files are stored.
Database: select Oracle from the drop-down menu
Property Type: select Built-In from the drop-down menu
Connection Type: select Oracle SID from the drop-down menu
DB Version: select Oracle 12-7 from the drop-down menu
Host: enter the Host and IP address of the Oracle DB server
Port: enter the Oracle DB server's port number
Database: enter the name of the database
Username: enter the Oracle DB username
Password: enter the Oracle DB password
Configure the File List (tFileList) component to point to the directory where the source files are stored, and list the files to be processed. In this case, the parent Job is only loading CSV files, so set the Filemask as “*.csv”.
Configure the Iterating Flow for Each File (tIterateToFlow) component schema to receive the filename from the File List (tFileList) component, then pass it to the Table Name Lookup (tMap) component.
Configure the Mapper Table (tDBInput) component to fetch the filename and corresponding target table name from the mapper table in the Oracle DB. Edit the Schema and Query as shown below:
Configure the tMap component to look up row1.filename (filename value coming from the tIterateToFlow component) with row2.filename (filename value coming from tDBInput component). Set the Join Model as Inner Join. Pull row1.filename and row2.tablename to the output of the tMap component.
Review the Assign and Print Values (tJavaRow) component and notice that the context variables, tablename and filename, are assigned with the values from the tMap component.
Configure the Run Ingestion for selected File (tRunJob) component to call the child Job, LoadDynamicSchemaChild. Select the Transmit whole context check box to transfer the context values from the parent Job to the child Job.
Close the Oracle connection by connecting the tPostJob component to the Close Oracle Connection (tDBClose) component using an OnComponentOk link, then select the tDBConnection - Oracle Connection from the Component List drop-down menu.
The USER_TAB_COLUMNS view contains the metadata of all the user-created tables in Oracle. The following query is written to extract the metadata column_id, column_name, and data_type of the target table.
Configure the Storing Metadata in Maps (tJavaFlex) component with two hash maps, named columnNames and columnDatatype. The maps hold the metadata information of the column name and corresponding data type. These maps are assigned to global variables so they can be used in other components to build insert statements.
Configure the Input File (tFileInputDelimited) component to read the file you're loading. Note that the Field Separator is not set the same as the separator in the file. Thus, the entire row comes in as a single column.
The Java code builds the insert statements by mapping the data value fetched by the Input File (tFileInputDelimited) component to the corresponding metadata fetched by the MetadataExtraction (tDBInput) component. The sqlQuery variable contains the build insert statement for each row passed by tFileInputDelimited component.
The query built in the tJavaRow component is passed to the Execute Insert Query (tDBRow) component and is iterated for each insert statement.
Query the Oracle tables and notice that they loaded the data from the corresponding files.
Using this Talend Job, you detected the schema of files and target tables on the fly and used the mapping rule defined in the database to load data dynamically to Oracle tables.