Five Stars

Filename validation in talend



I need to validate filenames that are being processed. For e.g. requirement is file name should be

Main file - YYYYMMDD_HHMMSS_Field1_Field2.txt
Meta file - YYYYMMDD_HHMMSS_Field1_Field2_META.txt

Based on the file category i want two flows that will process main file and meta file and load into separate tables.

I am new to talend so detail explanation will be appreciated much



Pravin Sanadi

  • Data Integration
Ten Stars

Re: Filename validation in talend

Do You need process only files, which equal to already knowing pattern? or You need check filename not include nothing other than pattern?


example of filename pattern:






but this pattern mean - You must know file name up to seconds


this pattern in chain:

tFileList ->(Iterate) -> (Your Steps)

process all files from today


and etc

Five Stars

Re: Filename validation in talend



Thanks for quick response. I need to validate all the files having the specified format. How do I route two flows one for main and one for meta file to load it in other tables.




Eleven Stars TRF
Eleven Stars

Re: Filename validation in talend


I suggest you 2 options.

1st one, place a tFileList to get all the txt files, connect tFileList to a tJava with an iterate flow.

This tJava does nothing but opens 2 branch depending on the current filename.

Here I connect a tJava just to print the filename, preceding by "META > " if filename contains the string "_META.txt" (your usecase).

Here is what's the job looks like:


Look at the 2 "if" after tJava_4. The 1st one (on top) is for filenames with "_META.txt":


and the 2nd for other cases:


(see the exclamation point at the begenning for negation).


You just have to play with filemask and order by option of the tFileList to drive how the files are processed.


The 2nd approach is to proceed with a group of files first ("_META.txt" for example), then to go with the second group (non _META). In this case you need 2 separate tFileList with the corresponding filemask.

The 1st one to include all txt files with "_META.txt" in the name and the 2nd to exclude these files (see advanced settings for Exclude Filemask).

Here is what's the job looks like:


As you can see, I have 2 separate subjobs linked with a OnSubjob_Ok trigger for orchestration.

1st tFileList is here (target, "*_META.txt" files only) :


2nd tFileList is here (target, *.txt" files but "*_META.txt"):


And the Advanced settings for this component (for *"_META.txt" exclusion):


You got it?