I'm new to Talend.
Here is my requirement:
we need to discard rows of a table when the 25% of their columns is null.
col1 | col2 | col3 | col4
row1 null | 123 | null | null
row2 456 | 123 | 789 | 000
row1 will be discarded
row2 is good and pass the validation
The steps would be:
1) calculate the number of columns of a table
2) for every row, to check which column is null and count the number of null columns
3) if that number exceed the limit, then discard the row
Any suggestion on how to achieve it?
So far, I know how to get the first step, executing a SQL query.
For the second step I know the tMap component can be used to check rows, but then how to compare the numbers of columns dinamically? Maybe using a tJava component and writing code?
Solved! Go to Solution.
You have specified that you can get the first step using an SQL query. Get this value from DB and store it in a context variable of Talend. Since you are processing this exercise for only one table, another option is to load the context variable value directly.
For second and third step, you can either do in tmap or tjavarow. If you are using tjavarow, reset the value of a temporary variable as 0. Whenever you are getting a null for any column (using if condition within tjavarow), increment the count of this variable.
After verifying all the columns, if you are crossing the threshold of 25% (say 3 columns out of 12 are null), then add a flag value as Y (you will have to add an additional column in output flow) else it should be N.
Connect the tjavarow output to a tfilterrow and check the value of this variable (whether Y or N). All the records with flag value as Y should go to reject flow (since it has crossed the threshold). All records with N can go to good record flow.
You can also do the entire steps in tmap also. I would also recommend you to go through a 4 part Blog series specifying Talend Design Patterns and Best Practices. It will help you to make your Talend foundation strong. The link for the first part is as shown below.
If the answer has helped you, could you please mark the topic as resolved? Kudos are also welcome.
Watch the recorded webinar!
Find out about Talend Open Studio for Data Quality
Learn how to enable Data Governance
Take a peek at the definitive guide to Government Data Quality
Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend