From Thursday, July, 9, 3:00 PM Pacific,
our Community site will be in
read-only mode
through Sunday, July 12th.
Thank you for your patience.

(tAggregate,tDenormalise,tPivotToColumns) which is best at handling blank fields

Highlighted
Seven Stars

(tAggregate,tDenormalise,tPivotToColumns) which is best at handling blank fields

I would like to clean up some duplicate rows so that I combine all available data for duplicate rows into 1 but also not replicate data that already exists.

I'm not sure which component (tAggregate,tDenormalise,tPivotToColumns) would suit best as they all seem to involve combining all values found or selecting the first or last value. Its not safe to assume that all new data only appears in the 1st occurrence  

 

Is there a way to only execute the combine task when "new" information is detected without alot of custom logic testing string lengths and null values?

 

Sample data

ID|Name|Country|Town

1|Tom| Spain|

1|Tom|null|Madrid |

2| |Germany|Berlin

2|John| Germany|

3|Paul| |Glasgow

3|Paul|Scotland| 

 

 

and end up with

ID|Name|Country|Town

1|Tom|Spain|Madrid |

2|John| Germany|Berlin

3|Paul|Scotland|Glasgow

 

not

ID|Name|Country|Town

1|Tom;Tom|Spain|Madrid |

2|;John| Germany;Germany|Berlin

3|Paul;Paul|;Scotland|Glasgow;Glasgow


Accepted Solutions
Highlighted
Employee

Re: (tAggregate,tDenormalise,tPivotToColumns) which is best at handling blank fields

Hi,

 

    I hope you are looking for this solution.

image.png

 

tMap will remove any hardcoded null from the input data.

image.png

 

For example, 

Relational.ISNULL(row1.name) || row1.name.equalsIgnoreCase("null")?null:row1.name 

aggregation can be done like below.

image.png

 

Hope your issue is resolved Smiley Happy

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

View solution in original post


All Replies
Highlighted
Seven Stars

Re: (tAggregate,tDenormalise,tPivotToColumns) which is best at handling blank fields

Did you try tAggregate with MAX function insted of first or last?

DataTeam.pl

Highlighted
Eleven Stars

Re: (tAggregate,tDenormalise,tPivotToColumns) which is best at handling blank fields

Is it possible that 2nd occurrence will have different info then the 1st

 

2| |Germany|Berlin

2|John| France|

2|John| France|Paris

 

 

Regards
Abhishek KUMAR
Highlighted
Employee

Re: (tAggregate,tDenormalise,tPivotToColumns) which is best at handling blank fields

Hi,

 

    I hope you are looking for this solution.

image.png

 

tMap will remove any hardcoded null from the input data.

image.png

 

For example, 

Relational.ISNULL(row1.name) || row1.name.equalsIgnoreCase("null")?null:row1.name 

aggregation can be done like below.

image.png

 

Hope your issue is resolved Smiley Happy

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

View solution in original post

Highlighted
Seven Stars

Re: (tAggregate,tDenormalise,tPivotToColumns) which is best at handling blank fields

Hi @nikhilthampi 

Can you explain what the MAX function does with text data?

Highlighted
Seven Stars

Re: (tAggregate,tDenormalise,tPivotToColumns) which is best at handling blank fields

Max in text case means the last one in alphabetical order

 

For example when you have 3 values: 'P', 'B', 'L', null, 'H'

max => 'P'

min => 'B'

 

Null is omitted in functions max and min

DataTeam.pl

Highlighted
Employee

Re: (tAggregate,tDenormalise,tPivotToColumns) which is best at handling blank fields

Thanks @DataTeam  for pitching in.

 

@mobmsc2 , Could you please mark the topic as solution provided since we have answered your query?

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

Highlighted
Seven Stars

Re: (tAggregate,tDenormalise,tPivotToColumns) which is best at handling blank fields

So your process
Step 1 if row value is null OR equals the string null in whatever case possible RETURN null or the value
Step 2 Take the max value excluding nulls from what ever is found when joining by ID.

I had a look at the tAggregate docs and it doesn't explain what the functions count, min, max, avg, sum, first, last, list, list(objects), count(distinct), standard deviation do or what is acceptable input data types. I thought inputs had to be numeric
Highlighted
Employee

Re: (tAggregate,tDenormalise,tPivotToColumns) which is best at handling blank fields

Hi,

 

   If it is a String, min and max functions work in alphabetical order. Other functions are not needed in your use case.

 

   Are you facing any issue in current solution? I hope it is covering your use case.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog