I am moving tables from redshift to postgresql and rewriting many of the talend jobs. Many of the data files that will be uploading to various postgresql tables weekly and monthly are very big (millions of rows) and in my current talend jobs uploading to redshift tables, I was able to use tjava component to convert the csv files to gzipped files and then unzip them in a copy statement in tredshiftrow. Since I am no longer putting these zipped files in S3 (we are not using S3 or anything like it for the postgresql tables), how can I accomplish the upload? Can I accomplish the same thing in tpostgresqlrow?
Is your PostgreSQL DB in AWS? Or is it on-premise?
I think you can compressed the data file, while uploading/downloading, but you should think of using tPostgresqlOutputBulkExec to load millions of records.
My apologies for being so delayed in responding iburtally. I have not had a chance to try this component in the talend job, although I have looked at it and read up on it in the component guide. I plan to do so this week, as I have many talend jobs that will need to get around the zip thing. Thank you.
Good morning, iburtally!
I finally have had a chance to try to work with this component. The first talend job I am looking at uses tftpget to put a file into folder to be unzipped. the ftpget component is using ((string)globalMap.get("MostRecent")) for filemask. This iterates to tfileunarchive which used ((String)globalMap.get("tfilelist_1_Current_filepath) as the name of the archived file. We do this because the first part of the job extracts the most recent file from the ftp folder and just carries the filepath name throughout. Therefore, that final unzipped file is obviously never named with a static name. What component do I need to use between tfileunarchive and tpostgresqloutputbulkexec, or is there a code to use in the filename field in the tpostgresqloutputbulkexec that will just get any file in there (there will always only be one file in the folder). This job originally downloading, unarchiving, then zipping and then uploading zipped files to redshift. You suggested not zipping and instead using the tpostgresqloutputbulkexec component. Have I got that right? Thank you very much for your help.
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.