Problem with "ellipse" character and treplace component

Six Stars

Problem with "ellipse" character and treplace component

I think this might be a bug or limitation.  If you have a text file with the following line:

 

LF05.03|91|For nooks…||

 

The delimiter is the pipe "|" and notice there is the win1252 character ellipse "…"

 

When you use the treplace with pattern "^(LF05.+?)\\|$", substitution "$1", it should strip off the ending pipe character, but it does not work with the ellipse character in the file.  It does work if you take the ellipse out.

 

I'm doing some cleanup on a big file and treplace is failing when specific characters are present, such as the ellipse.

 


Accepted Solutions
Highlighted
Six Stars

Re: Problem with "ellipse" character and treplace component

This modified expression will work:

 

^(LF05\\.[^\\n]*)\\|$

 

Note that the problem is that the regex dot "." will not match the ellipse character.  this is a problem to be aware of if your input file is using windows 1252 encoding


All Replies
Highlighted
Six Stars

Re: Problem with "ellipse" character and treplace component

This modified expression will work:

 

^(LF05\\.[^\\n]*)\\|$

 

Note that the problem is that the regex dot "." will not match the ellipse character.  this is a problem to be aware of if your input file is using windows 1252 encoding

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog