This article describes how to stitch several files located on the same drive to a Talend Job in TMM 6.4.
You start with a very simple Job that reads from an input file, then writes to an output file.
The input file is a CSV file with full pathname: C:\tmp\tmm-inout\file-in.csv.
The output file is also a CSV file with full pathname: C:\tmp\tmm-inout\file-out.csv.
Use the Talend DI bridge to harvest the Job into TMM.
Put it into a configuration and open the connection editor.
Notice that, although the Job has both an input and an output file, there is only one connection in the editor. However, this is not an issue. As ETL/DI tools do, TMM will always try to factorize the data connectors as much as possible to minimize the stitching work, thus only one connection is needed.
Note: Of course, in cases where the files are located under different drives, the connection editor would show several connections and you would need a separate model for each.
Back to the single drive case, the next step is to harvest the files into TMM. In TMM 6.4, you can only use the initial file data catalog beta bridge known as FlatExcelFile. This FlatExcelFile bridge will be deprecated/removed from TMM 7.0, the next TMM main release, and replaced by the new official file system data catalog bridges.
Drop this model into the configuration created earlier, and go back to the connection editor. Select the store, then the store schema.
The configuration shows no warning for the connection on the Talend Job, and the stitching reporter confirms it is all fine.
From the ExplorerUI, you get the expected data impact...
...and data lineage.
You could have the same situation with a database server used by a DI/ETL Job with many schemas that are sometimes used as input or output or both. The second example below briefly depicts this situation with a PL/SQL script instead of a Talend Job.
The PL/SQL script is harvested into TMM:
insert into tmmuser.month_sales(bname, avgprice, quantity) select bname, avg(sprice), sum(quantity) from tmmuser.movement group by bname;
The database is harvested too.
Stitch all this in a configuration.
The data lineage is performed as expected.