I've been testing around with everything a bit, but haven't been able to find a proper tutorial to create a migration process for a whole database.
I've made a little test job that fetches data from one table in one database and copies it over to the target, it's really simple and straightforward and works perfectly. However I'm on the process to migrate a whole database, and I can't find anything that actually condenses this and explains it properly. Fidgeting around I've come across some concepts like recycling connections, splitting jobs into subjobs and parallellizing stuff, but it's not clear enough and I'm 100% sure I'm doing things wrong.
Are there any full tutorials on how to make something that is rather complex? I mean, not data-warehouse-complex, but something big enough to see if I could potentially use this in my job. At least an example of how to *properly* migrate about 5 tables with joins that would require some cascaded operations.
Creating a tutorial for this would be pretty difficult to accommodate different people's requirements and skill levels. This is a complete project in many cases. We can probably give you some ideas to help you on your way though and maybe even point you at some tutorials to show you ways of achieving smaller tasks within the project.
What database are you moving from and to?
Does the schema of the database you are moving to already exist?
Are you dealing with foreign key constraints?
There will be more questions based on the answers to the above.
How much of a database background do you have? There is a lot to consider regarding how your database will work. For example, if you are moving from a database that use autoincrement keys to Oracle, you will have to deal with that differently to a database using autoincrement keys. If your foreign keys are based on autoincrement keys, you may have to completely regenerate all of your keys, force you data into the system in a strict order or switch off your constraints and switch them back on again. A simple example of the complexities involved, and that is just one.
I don't think you will be able to build many reusable jobs without putting an awful lot of effort and Java coding given you want to potentially do this between different database types. Break this problem down into smaller sub problems. Then come back with specific questions around detailed problems.
I really am not trying to be a pain, but just thinking about what you might have to consider would result in a multi page brain dump. Talend can do this, but there is no easy "catch all" way of doing this I am afraid.
Before you decide on your sequence, can you switch off your constraints on your database? Also, can you reuse the primary keys and foreign keys from your source system? If the answer to both is YES, then you can migrate in any order and take an exact copy. If the answer to the above is not YES, then you need to load the entities in the order in which is dictated by the constraints AND (more importantly) the key values. For example, if you cannot reuse your keys, they will need to be regenerated. Primary keys will also be used a foreign keys in other tables. So you will need to make sure that you load the data in an order which will allow you to re-connect up your data.
As I said before, these are high level concerns which are kind of fundamental to how you approach this.
The answer is yes to both indeed, but that's not what I was asking. To put it straightforward, I need to know which components are meant to perform this task and how to use them properly.
For example, you can place a tMySqlInput component, fill in the information with the connection settings for your MySQL server, select a table name, connect that to a tMap and then get an output, filling down its info as well. On a different approach you can add the database connection metadata once and drag the input and output to the board. Both approaches achieve the same goal, but the latter is the one you should follow if you choose between those two, and I bet there are even better ways to do it.
In short, this should be a total abstraction from whether the databases are built one way or the other, what I want is a tutorial more advanced than fetching one table plus a lookup spreadsheet and map it into a .CSV file. I'm still checking tutorials and getting closer to what I need, although still lacking a full-fledged example.
EDIT: I've been reading this https://www.talendbyexample.com/talend-database-tutorial.html but I can't import the project to see what it does, seems like an incompatibility with the latest version of Talend, even though it states it's compatible with v5.2.2 and upwards.
The first 100 community members completing the Open Studio survey win a $10 gift voucher.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Pick up some tips and tricks with Context Variables
Learn how media organizations have achieved success with Data Integration
Introduction to Talend Open Studio for Data Integration.