Dynamic ingestion of data to an Oracle database

Overview

This article shows you how to build a Talend Job to load data from flat files to corresponding target tables in Oracle.

 

Sources for the project are available in the attached Zip files.

 

Environment

  • Talend Studio 7.0.1, though you can use this Job on any Talend 7.x environment
  • Oracle Database 11.02.04 (RDS is hosted in AWS. This Job runs on any version of Oracle DB.)

 

Prerequisites

  1. Create a mapper table in the Oracle DB that specifies the mapping information of the filename and the corresponding target table name; or use the mapper.sql file, located in the SourceFiles.zip file that is attached to this article.

    1.png

     

  2. Create target tables in the Oracle DB. For more information on creating tables, see the Oracle Creating Tables documentation page.

  3. Ensure source files are available in a directory.

 

Designing the Job

The Job is built using two Jobs:

  • Parent Job: orchestrates the file processing and target table mapping
  • Child Job: receives the filename and target table name from the parent Job and ingests the data accordingly

The Dynamic_Data_Ingestion.zip file, attached to this article, contains both the parent and child Jobs. Download the file, then import it into Talend Studio by performing the following steps:

  1. In the Repository view, right-click Job Designs, then select Import items.
  2. Select the Select archive file option, click Browse and select the Dynamic_Data_Ingestion.zip file.
  3. Select the LoadDynamicSchemaChild and LoadDynamicSchemaParent Jobs. Click Finish.

    2.png

     

Parent Job

  1. Double-click the LoadDynamicSchemaParent Job to open it in the Designer.

    3.png

     

  2. Open the Contexts view and set the value of the directory context variable to the directory where the source files are stored.

    4.png

     

  3. Double-click the Oracle Connection (tDBConnection) component and configure the database connection parameters.

    • Database: select Oracle from the drop-down menu

    • Property Type: select Built-In from the drop-down menu

    • Connection Type: select Oracle SID from the drop-down menu

    • DB Version: select Oracle 12-7 from the drop-down menu

    • Host: enter the Host and IP address of the Oracle DB server

    • Port: enter the Oracle DB server's port number

    • Database: enter the name of the database

    • Username: enter the Oracle DB username

    • Password: enter the Oracle DB password

    5.png

     

  4. Configure the File List (tFileList) component to point to the directory where the source files are stored, and list the files to be processed. In this case, the parent Job is only loading CSV files, so set the Filemask as “*.csv”.

    6.png

     

  5. Configure the Iterating Flow for Each File (tIterateToFlow) component schema to receive the filename from the File List (tFileList) component, then pass it to the Table Name Lookup (tMap) component.

    7.png

     

  6. Configure the Mapper Table (tDBInput) component to fetch the filename and corresponding target table name from the mapper table in the Oracle DB. Edit the Schema and Query as shown below:

    8.png

     

  7. Configure the tMap component to look up row1.filename (filename value coming from the tIterateToFlow component) with row2.filename (filename value coming from tDBInput component). Set the Join Model as Inner Join. Pull row1.filename and row2.tablename to the output of the tMap component.

    9.png

     

  8. Review the Assign and Print Values (tJavaRow) component and notice that the context variables, tablename and filename, are assigned with the values from the tMap component.

    10.png

     

  9. Configure the Run Ingestion for selected File (tRunJob) component to call the child Job, LoadDynamicSchemaChild. Select the Transmit whole context check box to transfer the context values from the parent Job to the child Job.

    11.png

     

  10. Close the Oracle connection by connecting the tPostJob component to the Close Oracle Connection (tDBClose) component using an OnComponentOk link, then select the tDBConnection - Oracle Connection from the Component List drop-down menu.

    step10.jpg

     

Child Job

  1. Double-click the LoadDynamicSchemaChild Job to open it in the designer.

    13.png

     

  2. Double-click the Oracle Connection (tDBConnection) component and configure the database connection parameters as shown in Step 3 of the parent Job.

    5.png

     

    The USER_TAB_COLUMNS view contains the metadata of all the user-created tables in Oracle. The following query is written to extract the metadata column_id, column_name, and data_type of the target table.

    14.png

     

  3. Configure the Storing Metadata in Maps (tJavaFlex) component with two hash maps, named columnNames and columnDatatype. The maps hold the metadata information of the column name and corresponding data type. These maps are assigned to global variables so they can be used in other components to build insert statements.

    15.png

     

  4. Configure the Input File (tFileInputDelimited) component to read the file you're loading. Note that the Field Separator is not set the same as the separator in the file. Thus, the entire row comes in as a single column.

    16.png

     

    The Java code builds the insert statements by mapping the data value fetched by the Input File (tFileInputDelimited) component to the corresponding metadata fetched by the MetadataExtraction (tDBInput) component. The sqlQuery variable contains the build insert statement for each row passed by tFileInputDelimited component.

    17.png

     

    The query built in the tJavaRow component is passed to the Execute Insert Query (tDBRow) component and is iterated for each insert statement.

    18.png

     

  5. In the post Job, all the inserts are committed, and the Oracle connection is closed.

 

Running the Job

  1. Empty the target tables created in the Oracle DB. For more information on emptying tables, see the Oracle Truncate Table documentation page.
  2. The corresponding source files, in the SourceFiles.zip file attached to this article, are extracted and placed in the directory where the context variable is set.

    sourcezip.PNG

     

  3. In Studio, switch to the parent Job, then on the Run tab, click Run. For logging purposes, the insert statement generated by each table is printed on the console.

    30.png

     

  4. Query the Oracle tables and notice that they loaded the data from the corresponding files.

    31.png

    32.png

    33.png

     

Conclusion

Using this Talend Job, you detected the schema of files and target tables on the fly and used the mapping rule defined in the database to load data dynamically to Oracle tables.

Version history
Revision #:
21 of 21
Last update:
‎04-16-2019 12:52 AM
Updated by:
 
Comments
Five Stars

@pmanjunath  Its clearly articulated with every details needed ... Thanks Manjunath

Employee

Thank you @srikantkadapala