How to optimize tSortRow with 10M rows ?

Six Stars

How to optimize tSortRow with 10M rows ?

Hello,

 

Do you have tips to optimize a 10 million lines processing with a tSortRow before inserting it into the database?

I have good performance at the beginning (~ 6600rows / s), the more the number of treated lines increases, the more the performances decrease. Arrived at 600 000 lines, I have the error OutOfMemoryError: GC overhead limit exceeded (I could increase the memory of the JVM for the job, but I think it's not optimal)

 

Thanks.


Accepted Solutions
Ten Stars

Re: How to optimize tSortRow with 10M rows ?

In your tSQLinput query add : order by <column>

Alternatively, it sounds like a load once to this table? say from a multiple file source, first store them in smaller fragments, write output based on some logic... like a file for each week of the year / data you want to sort by. Then process these smaller files and sort them before writing to db.

Alternatively, write to a tmp table, and next write a tsql : insert into finaltable as select... from tmptable order by your columns.


All Replies
Ten Stars

Re: How to optimize tSortRow with 10M rows ?

In your tSQLinput query add : order by <column>

Alternatively, it sounds like a load once to this table? say from a multiple file source, first store them in smaller fragments, write output based on some logic... like a file for each week of the year / data you want to sort by. Then process these smaller files and sort them before writing to db.

Alternatively, write to a tmp table, and next write a tsql : insert into finaltable as select... from tmptable order by your columns.

Six Stars

Re: How to optimize tSortRow with 10M rows ?

Thanks, i will try this

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Downloads and Trials

Test drive Talend's enterprise products.

Downloads