Four Stars

Load data from a gzipped file to a postgresql table

Good morning.

I am moving tables from redshift to postgresql and rewriting many of the talend jobs.  Many of the data files that will be uploading to various postgresql tables weekly and monthly are very big (millions of rows) and in my current talend jobs uploading to redshift tables, I was able to use tjava component to convert the csv files to gzipped files and then unzip them in a copy statement  in tredshiftrow.   Since I am no longer putting these zipped files in S3 (we are not using S3 or anything like it for the postgresql tables), how can I accomplish the upload?  Can I accomplish the same thing in tpostgresqlrow?

 

Thank you.

  • Big Data
4 REPLIES
Employee

Re: Load data from a gzipped file to a postgresql table

Is your PostgreSQL DB in AWS?  Or is it on-premise?  

 

I think you can compressed the data file, while uploading/downloading, but you should think of using tPostgresqlOutputBulkExec to load millions of records.    

Four Stars

Re: Load data from a gzipped file to a postgresql table

My apologies for being so delayed in responding iburtally.  I have not had a chance to try this component in the talend job, although I have looked at it and read up on it in the component guide.  I plan to do so this week, as I have many talend jobs that will need to get around the zip thing.  Thank you.

Four Stars

Re: Load data from a gzipped file to a postgresql table

Good morning, iburtally!

I finally have had a chance to try to work with this component.  The first talend job I am looking at uses tftpget to put a file into folder to be unzipped.  the ftpget component is using ((string)globalMap.get("MostRecent")) for filemask.  This iterates to tfileunarchive which used ((String)globalMap.get("tfilelist_1_Current_filepathSmiley Happy) as the name of the archived file.  We do this because the first part of the job extracts the most recent file from the ftp folder and just carries the filepath name throughout.  Therefore, that final unzipped file is obviously never named with a static name.  What component do I need to use between tfileunarchive and tpostgresqloutputbulkexec, or is there a code to use in the filename field in the tpostgresqloutputbulkexec that will just get any file in there (there will always only be one file in the folder).  This job originally downloading, unarchiving, then zipping and then  uploading zipped files to redshift.  You suggested not zipping and instead using the tpostgresqloutputbulkexec component.  Have I got that right?  Thank you very much for your help.

 

Four Stars

Re: Load data from a gzipped file to a postgresql table

sorry, meant to attach the snap shot of the talend job