I have a large database sql server and I would like to process the lines in slices.
For example for every 100,000 lines, I apply a process using a python script that I call with the component t_system.
How can I do it?
same as you will do in any language (for example SQL)
1. detect your id column (or any column which you could use for partitioning ), count number of rows
2. run you script with parameters - id > n * 100000 and id < (n+1)* 100000, where n from 0 to (number of rows / 100000)
3 increment n, n = n+1
4 do this until number of rows > n * 100000
Thank you for your reply.
If I understand correctly, I should translate this code into SQL and write it in the tmssql_input component then I use t_system to apply my script?
there are many ways todo this
as variant - Input component with select count(*) from table_name
store this value to global variable and create loop with Talend components
But first how can i get the id row because i don't have an auto increment id that i can use it for the condition (id>n*100000).
Join us live for a sneak peek!
Accelerate your data lake projects with an agile approach
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.