One Star

[resolved] tFilterRow Advanced Mode Regex match fails?

Hi All,
I have a simple job that iterates through a directory of files. I am seeking to filter the files where the name matches a list of regular expressions. I want to report on those files that match, and those that dont. (Those that match will continue for further processing).
I have:
tFileList -> tFileProperties -> tFilterRow - tLogRow (filter)
|
------- tLogRow (reject)
In the configuration of the tFilterRow, I have specified "advanced mode" and used the statement:
input_row.basename.matches("^\\d+")
as my initial test is simply to identify files beginning with any sequence of digits.
Currently all rows route through the reject log.
I have tested the same regex statement with tExtractRegexFields and it worked.
Does tFilterRow NOT support regex in this way?
Thanks
13 REPLIES
Moderator

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Hi,
So far, tFilterRow don't support for regex.
Could you please elaborate your case with an example with input and expected output values so that we can see if there is an alternative solution for your case.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Thanks Sabrina,

I note that the documentation suggests that regex is supported; might want to have that updated Smiley Wink
https://help.talend.com/search/all?query=tFilterRow&content-lang=en
"In the text field, type in the regular expression as required."

Anyway, my use case is essentially an ETL migration from a filesystem and database through to a new target system. The filesystem contains binary files, the database contains metadata. A primary key (id) number links the file to the database record.
I want to iterate through the file-system to identify and join the files to the metadata, then transform them into a new format for import into a new system.
I want to report on files and folders that were migrated, or failed, because they were unidentifiable etc..
I have a file-system containing about 30Tb of various image files and folders.
Examples would be:
12345_sometitle.jpg
46602_shot.psd
Latest_3498912.gif
Some_3452_file.bmp
fail_example.jpg
As I iterate through the file-system, I want to evaluate both files and folders against a regular expression that is designed to pickup an ID number that may or may not be in the name of the file/folder. There may be multiple regexes to support different criteria.
If one of the criteria matches, I want to extract the ID, then continue processing -- I will do a join to another data source to identify more metadata.
Ultimately I will write an XML file next to the binary file in a new directory, where it will be loaded into the new system.
I have been playing around with Talend for a few days to evaluate whether it will be a useful tool or not... I am new to Talend, but have a Java background.
Thanks,
Rob
Four Stars

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Hi Rob,
Based on description above, I understood that major challenge is in extraction of ID which may or may not be available in file system. And you have strong business rules or definitions to get the ID. Once ID is extracted, further process is simple to you i.e. inner join in tmap to get the file ID from database...
For extraction of above id from file system, I would recommend to use tJavaRow and multiple if clauses based on your regular expressions implemented using Java. As you have a java background, this will not be difficult for you.
Once you have extracted ID from file system, you can use tMap to join with the database and continue with your further processes.
Please let me know if it helps and the understanding is the same as you desire.
Thanks
Vaibhav
One Star

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Bonjour,
SVP j'ai besoin de votre aide.
J'ai un fichier texte contenant des lignes de 250 caractères comme des relevés bancaires. J'ai besoin de lire le fichier par bloc.
Par exemple:

083005600V300026EUR2 0026000722405270614 270614VIREMENT SEPA RECU YCI5 0671 05067120
0530056 00026EUR2 0026000722405270614 NPYXXXXX
0530056 00026EUR2 0026000722405270614 LCC EUR DU 25/06/201
0530056 00026EUR2 0026000722405270614 LC24 ORE309W7001
0530056 00026EUR2 0026000722405270614 RCN550138032301

083005600HY00026EUR2 00260007224B1270614 270614PRLV DDDDDDDD XXXXXXX 06004160
0530056 00026EUR2 00260007224B1270614 NPYDD
0530056 00026EUR2 00260007224B1270614 NBEEDF
0530056 00026EUR2 00260007224B1270614 IJJJJJJJJJJJJJJJJJJJJJJ SESS
0530056 00026EUR2 00260007224B1270614 RUH YYYYYYYYYYYYYY RSSS

Je voudrai lire le fichier par partie, par exemple pour chaque ligne commençant par "08", je prends les lignes qui la suivent commençant par "05" jusqu'à arriver à la ligne "08" ainsi de suite.
Avez vous une idée SVP.
Moderator

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Hi Souha,
This is an international forum and English is the language we use. Posting in English will allow you to get more visibility and more help. Thanks for your understanding!
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Hi every one,
I need your help please,
I have text file(positionnel file) where each line have 250 characters, like that
083005600V300026EUR2 0026000eeeeee270614 270614VIREMENT eeeeeeee YCI5 0671 05067120
0530056 00026EUR2 0026000eeeeee270614 NPYXXXXX
0530056 00026EUR2 0026000eeeeee270614 eeeeeeeeeeeeeeeeeeeeeee
0530056 00026EUR2 0026000eeeeee270614 eeeeeeeeeeeeeeeeeeee
0530056 00026EUR2 0026000eeeeee270614 eeeeeeeeeeeeeeeeeeeeeee
083005600V300026EUR2 0026000eeeeee270614 270614VIREMENT eeeeeeee YCI5 0671 05067120
I would like to read my file like that:
IF the line starts with "08", I have to check the next line, if it is starting with "05" , a Msg Box having "NPYXXXXX " will be appeared
Else Msg Box having "VIREMENT " will be appeared.

Thanks for your help.
Four Stars

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Hi Souha,
Read your input file...
- Use tjavarow
- Use string handling left to get 2 chars in some variable from first column
- Compare these variables with 08 and 05 using if then else clause
- Execute whatever code you want...
I will not recommend to use the msg box, in place you use System.out.println()... else you will get 100s of msg boxes on screen and will not be able to identify what is happening...
Thanks
Vaibhav
One Star

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Thanks for your quickly reply sanvaibhav,
How can I compare two lines in the same time ??? I have to write a java code ?
Four Stars

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Hi Souou,
If you want to compare two lines at a time.. then you can think of using tMemorizeRow component... But, I don't think that you need to do that..
Check the use case in talend or refer to some blogs
https://help.talend.com/search/all?query=tMemorizeRows&content-lang=en
http://www.talendbyexample.com/talend-tmemorizerows-component-reference.html
Thanks
Vaibhav
One Star

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Hi sanvaibhav,
I used the open source version of talend, I can't use tMemorize.
Have you another solution please. My issue is how I read a line and the next line in the same time?
I'm beginner in talend.
Tahnks.
Four Stars

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Hi souha,
tMemorizeRows component is from Talend Open Studio for data integration and not from paid / enterprise ...
Which version of talend you are using?
Another way could be to save the value of respective column to some context variable and then checking that value with the current row... after checking the value, at the end again save current value to context variable...
Vaibhav
One Star

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Hi sanvaibhav
I upgraded into Talend 5.4 and I finally get tMemorize, but I have a bad question: TMemorize allow me to memorize the previous lines but I would like to memorize the nextLine of the current one.
I don't know if it's feasible or no?
Thanks.
Four Stars

Re: [resolved] tFilterRow Advanced Mode Regex match fails?

Hmm... that's tricky... you can do it
Vaibhav