One Star

Tips for optimizing load of 4GB file in MS SQL Server DB

Hi,
as said in the subject I have a 4 GB file (50 columns, 6 M lines) to load into an MS SQL table
due to security reason (server is reached through a vpn) I cannot use bulk or bcp
So I have decided to use a

Can you give me tips to optimize loading process ?
increase batch size ? Can you give me typical values ?
Enable Parallel execution ?

Regards

TIS PE 4.1.3 on Linux, 48 GB Ram, 16 cpu
2 REPLIES
Employee

Re: Tips for optimizing load of 4GB file in MS SQL Server DB

If it is possible you should use a bulk load to do this. If not, you can execute batches from a normal output in parallel and just increase the number of threads to 16 or some such number. Batch size is based on you network and DB. I know for Oracle we found the preferred batch size was 10,000 on normal tables.
One Star

Re: Tips for optimizing load of 4GB file in MS SQL Server DB

jandry,
thanks for your answer,
I've tryied with 8 threads and batch size of 10,000 and got the following error :
### Job STARTED at 2011/09/23 09:09:18 (jobId=20110923_090916_HdPqx, jobExecutionId=20110923090918_CuZha) ###
Exception in thread "Thread-3"
Exception in thread "Thread-2"
java.lang.OutOfMemoryError: Java heap space
at net.sourceforge.jtds.jdbc.ParamInfo.clone(ParamInfo.java:315)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.addBatch(JtdsPreparedStatement.java:532)
at crm.chargement_abo_0_1.chargement_ABO$1tAsyncIn_tMSSqlOutput_1_ParallelThread.run(chargement_ABO.java:5733)
Exception in thread "Thread-4" Exception in thread "Thread-5" java.lang.OutOfMemoryError: Java heap space
at net.sourceforge.jtds.jdbc.ParamInfo.clone(ParamInfo.java:315)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.addBatch(JtdsPreparedStatement.java:532)
at crm.chargement_abo_0_1.chargement_ABO$1tAsyncIn_tMSSqlOutput_1_ParallelThread.run(chargement_ABO.java:5733)
Exception in thread "Thread-1" java.lang.OutOfMemoryError: Java heap space
at net.sourceforge.jtds.jdbc.ParamInfo.clone(ParamInfo.java:315)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.addBatch(JtdsPreparedStatement.java:532)
at crm.chargement_abo_0_1.chargement_ABO$1tAsyncIn_tMSSqlOutput_1_ParallelThread.run(chargement_ABO.java:5733)
Exception in thread "main" java.lang.Error: java.lang.Error: java.lang.Error: java.lang.Error: java.lang.OutOfMemoryError: Java heap space
at crm.chargement_abo_0_1.chargement_ABO.tChronometerStart_1Process(chargement_ABO.java:540)
at crm.chargement_abo_0_1.chargement_ABO.runJobInTOS(chargement_ABO.java:8378)
at crm.chargement_abo_0_1.chargement_ABO.main(chargement_ABO.java:8257)
Caused by: java.lang.Error: java.lang.Error: java.lang.Error: java.lang.OutOfMemoryError: Java heap space
at crm.chargement_abo_0_1.chargement_ABO.tMSSqlConnection_2Process(chargement_ABO.java:672)
at crm.chargement_abo_0_1.chargement_ABO.tChronometerStart_1Process(chargement_ABO.java:519)
... 2 more
Caused by: java.lang.Error: java.lang.Error: java.lang.OutOfMemoryError: Java heap space
at crm.chargement_abo_0_1.chargement_ABO.tMSSqlRow_1Process(chargement_ABO.java:798)
at crm.chargement_abo_0_1.chargement_ABO.tMSSqlConnection_2Process(chargement_ABO.java:651)
... 3 more
Caused by: java.lang.Error: java.lang.OutOfMemoryError: Java heap space
at crm.chargement_abo_0_1.chargement_ABO.tFileList_1Process(chargement_ABO.java:4211)
at crm.chargement_abo_0_1.chargement_ABO.tMSSqlRow_1Process(chargement_ABO.java:777)
... 4 more
Caused by: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
### Job ENDED BY USER at 2011/09/23 09:15:01 (jobId=20110923_090916_HdPqx, jobExecutionId=20110923090918_CuZha) ###

Then with 4 threads :
Exception in thread "Thread-6" Exception in thread "Thread-1" Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
at net.sourceforge.jtds.jdbc.ParamInfo.clone(ParamInfo.java:315)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.addBatch(JtdsPreparedStatement.java:532)
at crm.chargement_abo_0_1.chargement_ABO$1tAsyncIn_tMSSqlOutput_1_ParallelThread.run(chargement_ABO.java:5733)
Exception in thread "Thread-4" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "main" java.lang.Error: java.lang.Error: java.lang.Error: java.lang.Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
at crm.chargement_abo_0_1.chargement_ABO.tChronometerStart_1Process(chargement_ABO.java:540)
at crm.chargement_abo_0_1.chargement_ABO.runJobInTOS(chargement_ABO.java:8378)
at crm.chargement_abo_0_1.chargement_ABO.main(chargement_ABO.java:8257)
Caused by: java.lang.Error: java.lang.Error: java.lang.Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
at crm.chargement_abo_0_1.chargement_ABO.tMSSqlConnection_2Process(chargement_ABO.java:672)
at crm.chargement_abo_0_1.chargement_ABO.tChronometerStart_1Process(chargement_ABO.java:519)
... 2 more
Caused by: java.lang.Error: java.lang.Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
at crm.chargement_abo_0_1.chargement_ABO.tMSSqlRow_1Process(chargement_ABO.java:798)
at crm.chargement_abo_0_1.chargement_ABO.tMSSqlConnection_2Process(chargement_ABO.java:651)
... 3 more
Caused by: java.lang.Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
at crm.chargement_abo_0_1.chargement_ABO.tFileList_1Process(chargement_ABO.java:4211)
at crm.chargement_abo_0_1.chargement_ABO.tMSSqlRow_1Process(chargement_ABO.java:777)
... 4 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

It finally worked with 3 threads.
According to the configuration of the server (48 GB RAM, 16 cpu) I was hopping more threads. Do I need to increase so memory settings ?
Regards