In Data Preparation, I would like to create a column with the number of occurrences of the values of another column. This would filter the duplicate values for example.
I didn't find out how to do that.
I am actually not sure why you want to create a new column. As you describe it, the profiling panel of Data Prep seems to be the perfect candidate for such a use case. You can see very easily the number of occurrence of your values (so that you can directly see duplicates and unique values) and then you can directly filter by selecting them.
See the attached screenshot where you can see 2 duplicate values and 1 unique value directly in the profiling bar chart.
You're right, profiling panel allows to visualize the duplicates and filter them but nothing allows to filter all duplicated values via an action.
For example: I have the following data:
I wish I could apply an action on col_1 to calculate the number of occurrences of values as follows:
Then, I could apply an action on col_2 to filter the values > 1. So, automatically, I could quickly treat ALL duplicated values.
I hope my explanations are clear ;-)
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Move from On-Premises to the Cloud by following the advice of experts
Talend continues to revolutionize how businesses leverage speed and manage scale
Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend