tExtractRegex usage and escaping for talend or java

One Star

tExtractRegex usage and escaping for talend or java

I have a column in my data I am trying to break into 4 columns on a | delimter. I ended up using tExtractRegexFields and got a pattern to work in regex testers finally as groups, but the talend regex won't escape the pipe ( | ) and I end up getting odd results after the tExtractRegex and tConvert (split into strings, then try to cast.
The regex tester is here 
here is my pattern: ^([0-9\.]*) \| ([0-9\.]*) \| ([0-9\.]*) \| (.*) \| (.*)$
here is the terrible sample data column: 1 | 6.39 | 9.76 | FL500S | FILTER ASY - OIL


debug console tLog after the regex has | replaced with [] so I can see if the pipes were removed and new columns made.
the tLog row pre regex has :: instead of |

Repair_Order SoLine SoPartLine Qty Cost List Part Part_Description
Repair_Order SoLine SoPartLine Qty_Cost_List_Part_Desc
6262880::3::1::1 | 1736.33 | 2315.11 | 7L3Z7000ABRM | AUTOMATIC T
6262880 [] 3 [] 1 [] 1 []  []  []  [] 
6262880 [] 3 [] 1 []  [] 1736.33 | 2315.11 | 7L3Z7000ABRM | AUTOMATIC []  []  [] 
6262880::3::2::1 | 600.00 | 600.00 | 7L3Z7000ABRM-C | 7L3Z 7000 A
6262880 [] 3 [] 2 [] 1 []  []  []  [] 
6262880 [] 3 [] 2 []  [] 600.00 | 600.00 | 7L3Z7000ABRM-C | 7L3Z 7000 []  []  [] 

One Star

Re: tExtractRegex usage and escaping for talend or java

One Star

Re: tExtractRegex usage and escaping for talend or java

so double backslash to escape the escape in java I guess. I was getting 2 or 3 rows per record because the grouping wasn't working right because of the pipe meaning either or in regex if it was not escaped right. which made debugging harder for me to catch the issue.
"^([0-9]*) \\| (.*) \\| (.*) \\| (.*) \\| (.*)$" used to parse: "6 | 3.75 | 7.86 | XO5W20BFS | MOTORCRAFT SAE 5W-20"
gave me the results I wanted

6298055::3::2::6 | 3.75 | 7.86 | XO5W20BFS | MOTORCRAFT SAE 5W-20

6298055 [] 3 [] 2 [] 6 [] 3.75 [] 7.86 [] XO5W20BFS [] MOTORCRAFT SAE 5W-20
Ten Stars

Re: tExtractRegex usage and escaping for talend or java

Glad to hear you fixed your issue.  I think you can also split a field on a delimiter character using the tExtractDelimitedFields component.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog