Four Stars

migrating old sql server 2005 db (1.5Mrd Rows in biggest Table) - not able to guess schema of db view

Hi,

 

Im in the unfortunate situation to migrate a bunch of views from a big MS SQL Server 2005 database to a new hive cluster. The process must be developed in Talend. I now experienced a couple of problems:

 

1. It seems that Talend Studio is not able to guess the view schema correctly, which means that I have to create the internal schema on my own. Unfortunately the views I have to migrate have 200+ Columns. Anyone else experienced such problems or has some hints for me? Or is it just due to the fact that I have to deal with this **bleep**ty old database?

 

2. For work I'm using a Mac-Pro (2016) running on Sierra. I already adjusted the Xms (2GB) and Xmx (4GB) settings for Talend Studio but It's still slow as hell. Too many times the Studio is freezing. Closing the Studio in a normal way is not possible (kill -9 will do the job though) and in general it slows down the whole system performance. Anyone experienced the same behaviour with Talend Studio (6.4.1) and has some tips to improve the overall performance?

 

Best

Arne

1 REPLY
Five Stars

Re: migrating old sql server 2005 db (1.5Mrd Rows in biggest Table) - not able to guess schema of db view

For schema guessing, you may put the table query in the tInput just as it appears in SQL Server Management Studio. In other words, don't Guess Query, just Guess Schema. Once you have a guessed schema, open it under Edit Schema and save it as an XML document. You may be able to do some searching and replacing in the XML document and them reload it as into the schema. For some reason, Talend turns every date into a Varchar. I think it is a Unicode thing.

 

These are a couple of gotchas that can affect speed (without seeing your job, this is all I got):

tMap join with Lookup Model that reloads at each row. Just Load Once.

tMSSqloutPut with Action on data that is "Insert or Update"