Postgresql efficient customized components (bulks)

One Star

Postgresql efficient customized components (bulks)

Hello,
It seems that composant POSTGRESQL bulkExec (insert or update) do not return the number of line inserted/updated.
The oracle's one return this important information ((Integer)globalMap.get("tOracleOutputBulk_1_NB_LINE"))
An other problem I encounter, is about the format :
For a CSV file, if the date is null (then the date in CSV is "firstfield";"";"second"  ), the COPY action fail because of bad format for date. "firstfield";;"second" format would be better.
It works for null strings, but not for dates. Then, the only way I found was to export as text, and remove all separators from the text.
What can I do, (I need this composant now working) (Can I modify the composant myself ?)
Thanks,
One Star

Re: Postgresql efficient customized components (bulks)

Well, I have modified the official component tPostgresqlBulkExec
If someone interested you can find it on my cithub account ( user "parisni" )
They are some news :
- not compatible with postgresql version Prior 9
- it returns now the number of line INSERTED (if bulk insert) and the number of line UPDATED (if bulk update)
- in this version you do not need to push a file on the remote server : it uses a inputstream on the client computer. Then I think it's faster, and better.
Note that I have not repaired the problem with null dates exported in CSV file. The only way that works is TXT file. (tFIleDelimited without CSV options). Then be carefull to remove the separator choosen (\t in my case) in the string value before generating the TXT on local computer.
If some are interested, I may add a bulk DELETE option. Contact on github
One Star

Re: Postgresql efficient customized components (bulks)

I have modified the tPostgresqlOutputBulkExec too: 
- note that the official contains errors 
- have added : NB_LINE_INSERTED, NB_LINE_UPDATED, NB_LINE_DELETED, QUERY, NB_LINE_MODIFIED ( = inserted OR updated OR deleted)
The bulk delete work that way : 
1)create a temp table
2)bulk insert rows in it
3) DELETE from table with join on key condition
- have modified the way COPY is used (use STDIN function = > then it is possible to use it as a classical tPostgresqlOutput without use tFtp tu push the csv on the remote server && faster for this reason)
you can get it visiting my github account (user  =  parisni  , talend repos ). 
One Star

Re: Postgresql efficient customized components (bulks)

I have modified the tPostgresqlOutputBulkExec too: 
- note that the official contains errors 
- have added : NB_LINE_INSERTED, NB_LINE_UPDATED, NB_LINE_DELETED, QUERY, NB_LINE_MODIFIED ( = inserted OR updated OR deleted)
The bulk delete work that way : 
1)create a temp table
2)bulk insert rows in it
3) DELETE from table with join on key condition
- have modified the way COPY is used (use STDIN function = > then it is possible to use it as a classical tPostgresqlOutput without use tFtp tu push the csv on the remote server && faster for this reason)
you can get it visiting my github account (user  =  parisni  , talend repos ). 

Thanks alot, your custom module was useful to me.
One Star

Re: Postgresql efficient customized components (bulks)

You are welcome,
I m still using them & updating the source code as well
Don't hesitate to star the github repos for more visibility
One Star

Re: Postgresql efficient customized components (bulks)

I was looking at your version of tpostgresbulkexec and I find that it makes much more sense that what is shipped with default installation of TOS. I was just wondering whether one could make it even more efficient by
a. Reading the input stream directly from memory (bypassing the need to go to file and them open the stream from file)
b. Executing the copy from stdin asynchronously, while the data is being read from the source 
Any thoughts?
One Star

Re: Postgresql efficient customized components (bulks)

Hi all,
There is some news here :
- https://github.com/parisni/talend/tree/master/tPostgresqlOutputBulkAPHP : I have added the binary format. This works at least two times faster than CSV format. Notice that "APHP" means this has bean a little customized for my needs, but this component works in the general case too.
- https://github.com/parisni/talend/blob/master/_tableActionForBulk.javajet  : I have added a "create an unlogged table", to the table action. Be carefull unlogged table are compatible with 9.4> only. Unlogged table suits well for temporary tables, and are loaded very quickly
-https://github.com/parisni/talend/tree/master/tPostgresqlSCDELT ; I have added the ability to drop the source table. This works well with "create unlogged table". Use case is by exemple "upsert". i) create unlogged table ii) bulk load it iii) test what is new/updatable thanks to SCDELT iv) do the merge.
This way of doing is very efficient, and works from few rows to very huge data. (SCDELT type1)
@Francesco:
a. Reading the input stream directly from memory (bypassing the need to go to file and them open the stream from file)
=> Will take a look to that. Thanks !
b. Executing the copy from stdin asynchronously, while the data is being read from the source
=> I don't understand.