Five Stars

tJDBCInput issue with large dataset

Hi All,

I am quite new to Talend and just started using it. I have to copy data from several sources one of them is Postgres DB with SSL enabled. Initially the ready made DB connection for postgres didn't work for SSL and I came across suggestion to use General JDBC with few changes in JDBC URL, it worked. I am able to connect to Postgres DB and retrieve schema. There is one table with 5 million records, 35 columns. The requirement is to drop table on each refresh and recreate it.

 

I am getting below mentioned error 

[statistics] connecting to socket on port 3819
[statistics] connected
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1969)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:570)

 

I am using below mentioned setup 

  1. Created a DB connection using generic JBBC Connection to connect to Postgres DB
  2. Retrieve schema 
  3. Create a job tJDBCInput -> tOracleOutput

For tJDBCInput below are the setting 

 

Somewhere it was mentioned that MySQL and Postgres copies entire input data onto local disk before coping to destination and it was advised that  "Enable stream" is the MySQL equivalent of Postgres "Enable Cursor". Hence I used Enable Cusrsor

 

Capture1.JPG

 

Run setting for the job.

Capture2.JPG

 

This runs for a while without coping anything to destination and gives the Java heap space error. I am not sure what else I can do to copy data and make this work, please note that with all these setting if I limit the rows to 3 million everything works.

 

Few point to help you to help :-)

  1. XMS/X 4048 is the max I can allocate, don't have any more memory.
  2. Execution time is not a big concern as long as the job is robust and executes periodically
  3. Entire table with all the columns has to be copied, no exception of removing any column or rows
  4. Has to be done as one job and not in batches , table has data in such a way that there it almost not possible to uniquely identify each row using one column.  

Any help/suggestion would be a great help, thanks in advance. 

 

2 REPLIES
Moderator

Re: tJDBCInput issue with large dataset

Hello,

Have you tried to allocate more memory to your studio to see if it works? Please look at this article:TalendHelpCenter:Allocating more memory to Talend Studio.

 

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Five Stars

Re: tJDBCInput issue with large dataset

Hi Sabrina,

 

Thanks for the help. I tried the suggestion mentioned in the post and set the .ini files as recommended 

 

-vmargs
-Xms1024m
-Xmx4096m
-Dfile.encoding=UTF-8
-Dosgi.requiredJavaVersion=1.8
-XX:+UseG1GC
-XX:+UseStringDeduplication

 

But I am still getting the error message 


Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

 

all the setting as same as mentioned in the original post. Somewhere I read that the data is first stored in the local memory and then copied to destination, is there a way not to store the data and just stream data from source to destination?