How to optimize tSortRow with 10M rows ?

Six Stars

How to optimize tSortRow with 10M rows ?

Hello,

 

Do you have tips to optimize a 10 million lines processing with a tSortRow before inserting it into the database?

I have good performance at the beginning (~ 6600rows / s), the more the number of treated lines increases, the more the performances decrease. Arrived at 600 000 lines, I have the error OutOfMemoryError: GC overhead limit exceeded (I could increase the memory of the JVM for the job, but I think it's not optimal)

 

Thanks.


Accepted Solutions
Ten Stars

Re: How to optimize tSortRow with 10M rows ?

In your tSQLinput query add : order by <column>

Alternatively, it sounds like a load once to this table? say from a multiple file source, first store them in smaller fragments, write output based on some logic... like a file for each week of the year / data you want to sort by. Then process these smaller files and sort them before writing to db.

Alternatively, write to a tmp table, and next write a tsql : insert into finaltable as select... from tmptable order by your columns.


All Replies
Ten Stars

Re: How to optimize tSortRow with 10M rows ?

In your tSQLinput query add : order by <column>

Alternatively, it sounds like a load once to this table? say from a multiple file source, first store them in smaller fragments, write output based on some logic... like a file for each week of the year / data you want to sort by. Then process these smaller files and sort them before writing to db.

Alternatively, write to a tmp table, and next write a tsql : insert into finaltable as select... from tmptable order by your columns.

Six Stars

Re: How to optimize tSortRow with 10M rows ?

Thanks, i will try this