Five Stars

Knowing what's replaced in a tReplaceList

 Hello,

in a tReplaceList given an IN row

id|address

1| Fifth Avenue AAA Square

2| Times Square BBB

 

and a LU row (lookUp)

Avenue|AVE

Bridge |BDG

Square|SQ

 

we can replace all words in a column (address) by a replacement in LU like:

1| Fifth AVE AAA SQ

2| Times SQ BBB

 

what I'm trying to do is put in a 3rd column the first replacement like:

1| Fifth AVE AAA SQ|AVE

2| Times SQ BBB     |SQ

 

is there any way to achieve this ?

Thank's

1 ACCEPTED SOLUTION

Accepted Solutions
Five Stars

Re: Knowing what's replaced in a tReplaceList

 Hi,

 

I too left it as last resort option. In some situations, java components are still needed, and no amount of standard Talend components can replace them. so in my case I did:

LU ---> tJavaFlex

[Beg: HashMap<String,String> abrev = new HashMap<String,String>();]

[Mid : abrev.put(row6.label, row6.acronyme);]

[end :globalMap.put("abrev",abrev);]

 

and :

 

IN ---> tJavaFlex ---> OUT

pos = MAX;


for (HashMap.Entry<String, String> entry : abrev.entrySet()) {
adr = adr.replaceAll("(?i)\\b"+entry.getKey()+"\\b", entry.getValue());

//if replacement occured && occurence is before last replacement
if(row7.adr.matches("(?i)(.)*(\\b"+entry.getKey()+"\\b)(.)*") && row7.adr.indexOf(entry.getKey())<pos){
sub = entry.getValue();
pos = row7.adr.indexOf(entry.getKey());
}
}

row8.adr = adr;
row8.typeA = sub;

 

2 tJavaFlex were necessary because of it's limitation in the nb of INputs.

 

Regards,

 

Med

 

6 REPLIES
Six Stars

Re: Knowing what's replaced in a tReplaceList

Hi there,

 

In your example, the first row has two substitutions, but your added column only includes one. Is this the correct behaviour, or should it contain "AVE SQ"?

 

What would happen if there were multiple substitutions within the source text (e.g. "Avenue Avenue")? Unlikely to happen in real world data, but if it's possible, then your solution would need to take this into account.

 

The general approach I'd take would be to do the substitutions individually, keeping a copy of the original value, and then just comparing the two values to see if they're different afterwards, meaning that one or more substitutions did occur. You'd then append the replace value to a String building up a list of all the substitutions which were done.

 

Depending on the structure, and what else your job is doing, this could get a little messy, so it may well be neater to use one of the Java components, but it's certainly something that could be done.

 

Regards,

 

 

Chris

Five Stars

Re: Knowing what's replaced in a tReplaceList

the 3rd column should contain AVE : the replacement of the first word matching (sequentially) the labels in LU.csv, the first word being the main one.
my last hope is using tJavaFlex like you said, to iterate over IN and inside, iterate over LU (complexity isn't an issue)
NB: first , the treatment sorts the LU by nb of words in order to match LARGE ALLEY before ALLEY
Six Stars

Re: Knowing what's replaced in a tReplaceList

It's rare I'd advocate "hiding" something as significant in terms of the overall job in a single Java component, but in this case, it probably makes more sense. You can easily do the sorting, replacing, and building of your 3rd column in Java, whereas an implementation using standard Talend components would be decidedly complex.

 

 

Are you happy enough with how you'd go about implementing this in a tJavaFlow?

 

Regards,

 

 

Chris

Five Stars

Re: Knowing what's replaced in a tReplaceList

 Hi,

 

I too left it as last resort option. In some situations, java components are still needed, and no amount of standard Talend components can replace them. so in my case I did:

LU ---> tJavaFlex

[Beg: HashMap<String,String> abrev = new HashMap<String,String>();]

[Mid : abrev.put(row6.label, row6.acronyme);]

[end :globalMap.put("abrev",abrev);]

 

and :

 

IN ---> tJavaFlex ---> OUT

pos = MAX;


for (HashMap.Entry<String, String> entry : abrev.entrySet()) {
adr = adr.replaceAll("(?i)\\b"+entry.getKey()+"\\b", entry.getValue());

//if replacement occured && occurence is before last replacement
if(row7.adr.matches("(?i)(.)*(\\b"+entry.getKey()+"\\b)(.)*") && row7.adr.indexOf(entry.getKey())<pos){
sub = entry.getValue();
pos = row7.adr.indexOf(entry.getKey());
}
}

row8.adr = adr;
row8.typeA = sub;

 

2 tJavaFlex were necessary because of it's limitation in the nb of INputs.

 

Regards,

 

Med

 

Six Stars

Re: Knowing what's replaced in a tReplaceList

Great. Glad you got this working.

Moderator

Re: Knowing what's replaced in a tReplaceList

Hello,

Can you please mark your solution as accepted, which allows others to see what has worked?

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.