TMap and Lookups 20+M records

Six Stars

TMap and Lookups 20+M records

Hello,

 

I was trying to convert Informatica mappings to Talend. 

 

Following table stats

1. Lookup-1: 28M (PostgreSQL Input with SQL join & Filter)  - Cursor Size - 1M

  • Store on Disk
  • Load Once
  • First match

2. Lookup-2: 35M (PostgreSQL Input with SQL Filter) - Cursor Size - 1M

  • Store on Disk
  • Load Once
  • First match

Lookups in Parallel

 

3. Main Table: 27M (PostgreSQL Joins with multiple tables and Date Filters)

 

Max Memory Settings I provided was 8GB

 

Lookups are running fine, but once it reaches Main Table read it will slowdowns and after 1hr of running the whole process it comes out as Java Heap Memory error.

 

Not sure what else I got to look to make this work. And this is just one small Mapping and going fwd down the line got too complex mappings with huge data.  

 

Does parallelization help?

Does Multi Thread execution help? If Yes, what buffer unit size should set to?

Or Custom Batch processing process every 5M records will help?

 

Please do advice, Thanks. 

 

 

Twelve Stars

Re: TMap and Lookups 20+M records

hi,
Java Heap Memory error is due to allowed memory to java process.
so you may (depend on your job):
- incrase -Xmx param
- split process
- use reload for etch row on tmap
- …
good luck

Francois Denis

Tag as "solved" for others! Kudos to thanks!

Six Stars

Re: TMap and Lookups 20+M records

Thanks @fdenis

 

If I use reload for each row - as far as I know this decreases the overall execution time

 

Twelve Stars

Re: TMap and Lookups 20+M records

it depend on how is build your job and how is filtred your lookup.

Francois Denis

Tag as "solved" for others! Kudos to thanks!

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog