Hi, i have a problem with an extraction and load of data in an oracle db, i have millions of records that need to be migrated, but the strategy that i need to follow is about of data set's based on a certain number of persons, in other words i need to extract and load only the transactions of my set of persons.
Actually i'm doing the extraction and load with and tMap, extracting the set of persons with an tOracleInput that's acting like an lookup, and matching with the other tOracleInput that extracts all the transactional data, it works, but what is the problem? the problem is to extract the data of that set of persons i need to fetch all the transactions in the database (and its a lot of data) so i need to some component or idea to fetch only the info for the people that i need without fetch all the transactions in the database.
In the image below you will see how i actually do the extraction, i only need to match ID of the person with the ID of the transaction.
This is the lookup that i use to match transaction with the people.
Thanks for your help.
One way could be to turn it. Use the user input as main input and select for the users the transactions.
Or simply do a join directly in the database to get the desired transactions for the users.
Exactly, the persons and the transactions are in different tables (and enviroments), and the set of persons is about 1 million, but the transactional data is about 90 million of records, this is why i have this problem, because the job need to fetch the 90 million records for the set of persons. I would like to have some component to only extract the data that i need (in other case i try to put the id of the persons in the query and let the database do the job, but now i have a sample of 1 million of persons, so this strategy it's not availble any more)
What kind of component could that problem solve? You have to limit the amount of transactions in the query! This cannot be solved by a component.
You can separate the list of users in smaller groups and use the small list in the where clause of the transaction query. This way you have to rerun the job until you have finished all related users. This would also reduce the amount of data to process - actually a good design.
I understand that, i need to improve the query, the problem is that is a large amount of persons (1 million) and cannot put then in "groups" in the where clause of the query, so there is some way to dynamicaly put some of the persons rigth in the where clause of the query, based of some number of clients from a different DB? to improve the job and don't fetch all the transactions.
Thanks for you help
Watch the recorded webinar!
Accelerate your data lake projects with an agile approach
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.