Six Stars

Removing question marks "?" in Talend

I have several rows which are entirely question marks. I am pasting some sample data below

 

id	text
1328qdfjhase	This is a text
1038qdfjhase	???? ??  ????
1114qdfjhase	This is also text
1455qdfjhase	Another text
1376qdfjhase	Extra text

I want to get rid of the second row as it only contains question mark and the data is of no use to me. I tried using tMap function EREPLACE function to replace the question marks to blank as

StringHandling.EREPLACE(out3.text,"?","")

and next i plan to filter the rows which are blank. However i am getting error at tMap component as 

 

Exception in component tMap_1
java.util.regex.PatternSyntaxException: Dangling meta character '?' near index 0
?
^
	at java.util.regex.Pattern.error(Pattern.java:1955)
	at java.util.regex.Pattern.sequence(Pattern.java:2123)
	at java.util.regex.Pattern.expr(Pattern.java:1996)
	at java.util.regex.Pattern.compile(Pattern.java:1696)
	at java.util.regex.Pattern.<init>(Pattern.java:1351)
	at java.util.regex.Pattern.compile(Pattern.java:1028)
	at java.lang.String.replaceAll(String.java:2223)
	at routines.StringHandling.CHANGE(StringHandling.java:96)
	at routines.StringHandling.EREPLACE(StringHandling.java:189)
	at local_project.clean_crmjl2_0_1.Clean_CRMJL2.tFileInputExcel_1Process(Clean_CRMJL2.java:4743)
	at local_project.clean_crmjl2_0_1.Clean_CRMJL2.runJobInTOS(Clean_CRMJL2.java:7478)
	at local_project.clean_crmjl2_0_1.Clean_CRMJL2.main(Clean_CRMJL2.java:7335)

Can anyone help?

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Six Stars

Re: Removing question marks "?" in Talend

Sorry, i was on vacation. I don't know why but instead this worked in tMap expression builder. I think issue was something else, not sure what though. I am now taking the input from excel files instead of CSV. could be because of encoding?

StringHandling.EREPLACE(out3.text,"?","")

 

9 REPLIES
Ten Stars

Re: Removing question marks "?" in Talend

As the error message suggests, a question mark is a meta character in pattern strings. You get around this by escaping it. Because your String will be interpreted before being used as a pattern, you have to type "\\?"

Six Stars

Re: Removing question marks "?" in Talend

I tried that and its not removing the question marks row for me.

Six Stars

Re: Removing question marks "?" in Talend

Have you tried something like row5.newColumn.replaceAll("\\?", "") ?

Ten Stars

Re: Removing question marks "?" in Talend

So, the string replacement will only make that value blank.  It won't remove the entire row from the data flow.  For that you'll need to filter using a tFilter component or a tMap.  If you trim() the text after replacing all of the question marks, you can set up an output filter like:

!rowX.text.isEmpty()

to only pass through records that aren't empty (assuming you don't have other empty values you want to preserve).

Six Stars

Re: Removing question marks "?" in Talend


douglaszickuhr wrote:

Have you tried something like row5.newColumn.replaceAll("\\?", "") ?


I am getting a new error as follows 

Exception in component tMap_1
java.lang.NullPointerException
	at local_project.clean_crmjl2_0_1.Clean_CRMJL2.tFileInputExcel_1Process(Clean_CRMJL2.java:4743)
	at local_project.clean_crmjl2_0_1.Clean_CRMJL2.runJobInTOS(Clean_CRMJL2.java:7477)
	at local_project.clean_crmjl2_0_1.Clean_CRMJL2.main(Clean_CRMJL2.java:7334)
Six Stars

Re: Removing question marks "?" in Talend

It seems that the value is null. Are you sure that you have value on that?

Are your components connected right? 

 

Paste here a screenshot of your job please. Smiley Happy

Twelve Stars TRF
Twelve Stars

Re: Removing question marks "?" in Talend

tMap is all you need:

Capture.PNG

Here is the expression I used to filter output rows:

!(StringHandling.BTRIM(row109.text.replaceAll("\\?*", ""))).equals("")

StringHandling.BTRIM is here to remove extra blanks which included in the text if any.

And the result (remark the last line which contains "?" but also other characters, so the line is in the result:

Starting job test at 22:00 23/06/2017.

[statistics] connecting to socket on port 3599
[statistics] connected
1328qdfjhase|This is a text
1114qdfjhase|This is also text
1455qdfjhase|Another text
1376qdfjhase|Extra text
999999999999|An extra ??? ?? ???? text to keep
[statistics] disconnected
Job test ended at 22:00 23/06/2017. [exit code=0]

Hope this helps.


TRF
Twelve Stars TRF
Twelve Stars

Re: Removing question marks "?" in Talend

@Enthusiast, does this helps or not?
Please, let us know and mark the case as solved if it is.

TRF
Six Stars

Re: Removing question marks "?" in Talend

Sorry, i was on vacation. I don't know why but instead this worked in tMap expression builder. I think issue was something else, not sure what though. I am now taking the input from excel files instead of CSV. could be because of encoding?

StringHandling.EREPLACE(out3.text,"?","")