Talend Metadata Manager (TMM) - Stitching several files to a Talend Job

This article describes how to stitch several files located on the same drive to a Talend Job in TMM 6.4.

 

You start with a very simple Job that reads from an input file, then writes to an output file.

The input file is a CSV file with full pathname: C:\tmp\tmm-inout\file-in.csv.

The output file is also a CSV file with full pathname: C:\tmp\tmm-inout\file-out.csv.

 

TalendJob.PNGThe Talend job

JobMap.PNGThe tMap in the job

Use the Talend DI bridge to harvest the Job into TMM.

 

JobInTMM.PNGThe Talend job harvested in TMM

Put it into a configuration and open the connection editor.

 

ConnectionEditor_00.PNGConnection editor

Notice that, although the Job has both an input and an output file, there is only one connection in the editor. However, this is not an issue. As ETL/DI tools do, TMM will always try to factorize the data connectors as much as possible to minimize the stitching work, thus only one connection is needed.

 

Note: Of course, in cases where the files are located under different drives, the connection editor would show several connections and you would need a separate model for each.

 

TwoDrives.PNGThe files are located on different drives

 

TD-connection.PNGThe connections can't be factorized

 

Back to the single drive case, the next step is to harvest the files into TMM. In TMM 6.4, you can only use the initial file data catalog beta bridge known as FlatExcelFile. This FlatExcelFile bridge will be deprecated/removed from TMM 7.0, the next TMM main release, and replaced by the new official file system data catalog bridges.

 

FilesSettings.PNGSettings for the Flat Files model

FilesInTMM.PNGThe files harvested in TMM

 

Drop this model into the configuration created earlier, and go back to the connection editor. Select the store, then the store schema.

 

ConnectionEditor_01.pngSelecting the "Files" store in the editor

 

ConnectionEditor_02.pngSelecting the schema in the editor

 

The configuration shows no warning for the connection on the Talend Job, and the stitching reporter confirms it is all fine.

 

StitchingReporter.PNGStitching reporter

From the ExplorerUI, you get the expected data impact...

 

DataImpact-key.PNGData impact for "key" field

...and data lineage.

 

DataLineage-valeur.PNGData lineage for "valeur" field

 

You could have the same situation with a database server used by a DI/ETL Job with many schemas that are sometimes used as input or output or both. The second example below briefly depicts this situation with a PL/SQL script instead of a Talend Job.

 

The PL/SQL script is harvested into TMM:

 

insert into tmmuser.month_sales(bname, avgprice, quantity) 
select bname, avg(sprice), sum(quantity) from tmmuser.movement group by bname;

 

plsql.PNGPLSQL harvested in TMM

The database is harvested too.

DB.pngSource and destination tables in the database model

Stitch all this in a configuration.

 

stitch-db-store.pngSelecting the DB store

 

stitch-db-store-schema.pngSelecting the DB store schema

DB-architecture-diagram.PNGArchitecture diagram

The data lineage is performed as expected.

 

DB-data-lineage.PNGData lineage from destination table in the ExplorerUI

Version history
Revision #:
7 of 7
Last update:
‎09-28-2018 11:57 PM
Updated by:
 
Labels (3)