Four Stars

Replacing Null with 0's on 100's of Columns

Hello,

 

I am currently creating an aggregate table in Talend Real-time Big Data platform. I am turning a transactions table that has many millions of rows into a aggregated version (per ID) that is very wide (approximately 1300 columns). My mapping is below:

tRedshiftInput --> tMap --> tAggregateRow --> tSortRow --> tPivotToColumnsDelimited

 

It works exactly as it should, I then take this CSV and and use the COPY command to copy the data back into Redshift in the required format. 

 

However, some ID's do not have values for their respective column and therefore a NULL is inserted. However for me to use this table in R, I would like to replace all NULL's to 0's.

 

I know this is possible to do using row1.column1 ==null?0:row1.column1 in tMap. But it is not practical to do this for 1300 columns, it'll take me a lifetime and I am not getting any younger!!

 

I have experimented with the DEFAULT value in the Schema's but a 0 is never inserted. I have also created the Redshift table with 0 as the default value, this did not work either. I also tried tReplace, but this requires every field to be explicitly stated which would also take very long.

 

Is there an easier way to convert all NULL's to 0's?

 

Bonus question: How can I get talend to create the Redshift table for me but not insert any rows? At the moment the only way I know how to do it is to filter the mapping to 1 row, insert into Redshift using tRedshiftOutput (create table if not exist) then deactivate the component, truncate the table and put back the tPivotToColumnsDelimited component.

 

Any help would be much appreciated.

 

Thanks

 

pp84

Tags (1)
2 REPLIES
Four Stars

Re: Replacing Null with 0's on 100's of Columns

Any help...???
Twelve Stars TRF
Twelve Stars

Re: Replacing Null with 0's on 100's of Columns

Hi,
I assume field separator for the csv file produced by tPivotToColumnsDelimited is ";".
So, you can get the content of this file using tFileInputFullRow then with a tJavaRow you may have something like that:
output_row.line = (" ; " + input_row.line + " ; ").replaceAll(" ; ;", ";0;") ;
Now you just have to remove 1rst and last character from output_row.line and push the result to a new csv file.
Should work (sorry I can't from my mobile phone).

TRF