Design and performance consideration on staging area

Four Stars

Design and performance consideration on staging area

Hi all,

I'm researching about techniques to prepare a staging area for a data warehouse. I am interested in using Talend Open Studio to load data from a production environment  to a staging area. 

 

In your opinion what are the best techniques to achieve this purpose? And how to increase performance?

 

Thanks

Twelve Stars

Re: Design and performance consideration on staging area

Hi!

 

nothing unresolved, but:

 

would be good provide little more information?

Or You will have very, very theoretical recommendations (same as question)

 

"Answer":

1. Staging database or warehouse not different from any other - production, development and etc

2. Use Stream for database components which support it (or cursors in other terms)

3. make huge transformation in best place - on source, on talend, on stage (target) database

4. Depending from Your database, use fasted method for loading data 


Think it is not very help You Smiley Happy

 

Questions:

0. What is Your current (or expected) troubles?
1. What expected size of data?

2. What architecture? (local cloud, in-house, mixed, oversea clouds)

3. What databases - source and target?

4. what transformations? (lookups and etc)

.... 

 

-----------
Four Stars

Re: Design and performance consideration on staging area

Thanks for the answear. My question is generic because I'm searching for create a "talend job prototype" to use in several scenariuos. I have different source DBMSs from which I'm to load data and create different staging area.

 

Some more pratical questions are:

There exists some Talend component to manage different DBMS source?

There exists a way to dinamically map tables structure?

There exists some component or technique to speed up the data copy?

 

I would like to understand if such research makes sense or it is better to use specific techniques for each dbms.

  

Twelve Stars

Re: Design and performance consideration on staging area

There exists some Talend component to manage different DBMS source?

- no, more or less universal tJDBC* - but it not mean it could work dynamically with different databases

 

There exists a way to dinamically map tables structure?

+-

more "-" rather "+" Smiley Happy

- subscription version support dynamic columns

- few components designed by community member for copy tables

but it not allow for You mapping (tMap), kill transformations inside Talend

 

There exists some component or technique to speed up the data copy?

it strictly depend from target database

most popular bulk technics for most popular databases supported - check all components which  contain BulkExec in it name

 

I like universal solutions as well, but very often - design this solution take much more time, than design 10-20 separate jobs.

 

"Some people write  Java ... some people write code with Java - and it is different people", same for universal components - if You design any of universal components, community would be thankful for You.

 

-----------