discard rows with a number of null columns

Highlighted
Six Stars

discard rows with a number of null columns

Hi all,

 

I'm new to Talend. 

 

Here is my requirement: 

we need to discard rows of a table when the 25% of their columns is null.

 

E.g.:

         -------------------------------

            col1 | col2 | col3 | col4

         -------------------------------

row1    null  | 123  | null  | null

         -------------------------------

row2    456  | 123  | 789  | 000

 

 

row1 will be discarded 

row2 is good and pass the validation

 

The steps would be:

1) calculate the number of columns of a table

2) for every row, to check which column is null and count the number of null columns

3) if that number exceed the limit, then discard the row

 

Any suggestion on how to achieve it?

So far, I know how to get the first step, executing a SQL query.

 

For the second step I know the tMap component can be used to check rows, but then how to compare the numbers of columns dinamically? Maybe using a tJava component and writing code?

 

Thanks,

Nick

Tags (1)

Accepted Solutions
Six Stars

Re: discard rows with a number of null columns

hi Nikhil, thanks for your suggestions! I didn't implement your solution even if it makes sense to me. Anyway, with the support of Talend we were able to figure that out. In the example above I talked of a single table, but in the future we want something dynamic that can be applied to many tables. I've attached an image which contains the solution (don't know how to upload it in line). Hope it can be helpful to someone else. Nick

All Replies
Employee

Re: discard rows with a number of null columns

Hi Nick,

 

     You have specified that you can get the first step using an SQL query. Get this value from DB and store it in a context variable of Talend. Since you are processing this exercise for only one table, another option is to load the context variable value directly.

 

     For second and third step, you can either do in tmap or tjavarow. If you are using tjavarow, reset the value of a temporary variable as 0. Whenever you are getting a null for any column (using if condition within tjavarow), increment the count of this variable.

     After verifying all the columns, if you are crossing the threshold of 25% (say 3 columns out of 12 are null), then add a flag value as Y (you will have to add an additional column in output flow) else it should be N.

 

     Connect the tjavarow output to a tfilterrow and check the value of this variable (whether Y or N). All the records with flag value as Y should go to reject flow (since it has crossed the threshold). All records with N can go to good record flow.

 

      You can also do the entire steps in tmap also. I would also recommend you to go through a 4 part Blog series specifying Talend Design Patterns and Best Practices. It will help you to make your Talend foundation strong. The link for the first part is as shown below.

 

https://www.talend.com/blog/2015/12/07/talend-job-design-patterns-and-best-practices/

 

     If the answer has helped you, could you please mark the topic as resolved? Kudos are also welcome.

 

Warm Regards,

 

Nikhil Thampi

Six Stars

Re: discard rows with a number of null columns

hi Nikhil, thanks for your suggestions! I didn't implement your solution even if it makes sense to me. Anyway, with the support of Talend we were able to figure that out. In the example above I talked of a single table, but in the future we want something dynamic that can be applied to many tables. I've attached an image which contains the solution (don't know how to upload it in line). Hope it can be helpful to someone else. Nick
Employee

Re: discard rows with a number of null columns

Hi,

 

     Happy to know that you cracked the issue !

 

     Enjoy programming in Talend :-)

 

Warm Regards,

 

Nikhil Thampi