Hi, I have a scenario, where i have tested two cases for performance. Case 1: PostgresDB---->Tmap------(filtering and if condition not satisfied rejecting down to text file)----->txt Case 2: PostgresDB(handling the filter case in SQL query)----->txt Environment: Source DB is present in different server.Data is around 2gb. We are using 16gb machine with redhat linux installed. Out of which 6gb were free. Cases tested: Case 2: It just took 3-4 minutes of time to load the data. Case 1: It's taking more than 30 minutes of time. I have following questions,kindly help me a)While filtering and rejecting records in Talend(Case 1), entire RAM was occupied and swap memory was used. It makes the job dead slower.
If the data size was huge,it directly affects the performance. Say for instance, if i need to process 512GB data, my RAM should be more than that? How people can afford 1TB machine in this case? Is it the same case with other ETL tools or i missing something? Kindly clarify. b)DB filter was very much faster than talend. Do you think, right approach is to push all functionalities inside the DB? Thanks
tMap fast when it do inMemory calculation, so as for any inMemory databases (we not told now about compression) - if You want work with 1Tb of data in memory, You must have 2Tb of Ram at least Database will work faster, because it use indexes for JOIN (if You are do not prepare wrong query). PostgreSQL as many other designed for work with data many times bigger than memory Notes - all above correct if You make JOIN lookups in tMap, or aggregations, so not work with single row from flow if You just filter - need to check what You try achieve, and may be it possible todo by other ways
Thanks Vapukov. How to handle data larger than RAM size in Talend ETL? Is there is any other way without pushing that to DB(ELT).
let return to Your original post You not provide full information, so I can return same question to You in "Human Readable" form You need relocate from one house to other and have huge amount of old staff and You want have Your new house is clean You have a case:
Upload all staff (for example 1000 items) to the street, sort them and take 10 items with You Make a list of 10 items, take them, sit to car and drive to new Home?
Which way is faster? And same cases, but when You want take with You 50% of items and must compare them? Situation could be different Same with Your question - speed off whole Job always will depend what really You try to do? How many (in %) records rejected by filter? is it any aggregations? external lookups? Short answer - Talend could work with big data sizes, which way faster and better - it always depends from how proper You define the Job. Talend, Postgres, OS - it is all just items from Your toolbox! Why You do not want all benefits from all of Your tools and want do all home tasks only by Hummer?