Six Stars

Dynamic schema alternative in Open Studio

Hi

 

I'm new to Talend so apologies if this is too much of a rookie question.

 

I've read a lot of posts saying Dynamic schema is not available on the open studio version of Talend and the suggested alternatives I've seen all solve the problem of mapping fields that are known before hand. e.g. Source table has columns 1 to 5 then rearrange(map) the order to match the target table.

 

My problem is I'd like to be able to "call" a predefined schema from the Metadata or just a file, at run time so that I pass the schema as a parameter and repeat this changing the schema based on my input list. This would be useful if I can use such a parameter to edit things like the Map editor for the tMap component.

 

I hope this isn't too far fetched using context.Variables or something similar.

 

Please advise on what my options are with this without using the Enterprise version.

 

Thank you

 

 

  • Data Integration
1 ACCEPTED SOLUTION

Accepted Solutions
Moderator

Re: Dynamic schema alternative in Open Studio

Hello,


accuracie wrote:

Hi

 

I'm new to Talend so apologies if this is too much of a rookie question.

 

I've read a lot of posts saying Dynamic schema is not available on the open studio version of Talend and the suggested alternatives I've seen all solve the problem of mapping fields that are known before hand. e.g. Source table has columns 1 to 5 then rearrange(map) the order to match the target table.

 

My problem is I'd like to be able to "call" a predefined schema from the Metadata or just a file, at run time so that I pass the schema as a parameter and repeat this changing the schema based on my input list. This would be useful if I can use such a parameter to edit things like the Map editor for the tMap component.

 

I hope this isn't too far fetched using context.Variables or something similar.

 

Please advise on what my options are with this without using the Enterprise version.

 

Thank you

 

 


With talend open studio, schemas must be defined during design, not at run time.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
6 REPLIES
Eleven Stars

Re: Dynamic schema alternative in Open Studio

This isn't going to be possible as you describe it, but there are potential workarounds depending on the variety of schemas and how much work you want to put in. However, the first thing I want to point out is that you cannot write one job to deal with any schema. A common request in here is for a "magic job" that will dynamically work out the schema, understand the changing mapping requirements and write to a dynamic table/file. If this sort of job was easy, it would largely do Data Integration people out of a job (....this won't happen in our lifetimes since the permutations of scenarios that this magic job would have to handle are practically endless).

 

"Magic Job" rant aside, there are possibilities.

 

1) You could build a parent job which selects a suitable child job to pass the file to depending on data related to the file. This is probably the easiest approach to this problem, but will take a reasonable amount of work building several child jobs. It also requires that you know of the possible schemas at design time.  

2) This method is similar to the above, but instead of nesting your child jobs inside the parent job (or controlling job) you can configure your framework so that all are separate jobs. The parent job can easily start child jobs in separate JVMs. This would enable an easily extendable way of achieving this. You do not need to know of all of the schemas at initial design time and if a new one comes along you can simply create a new job for it. So long as the parent job has been created to be a dynamic as possible with how it starts "child jobs", you could incrementally build up quite a solution using this.

3) If you are comfortable using Java you *could* build a process which is Java heavy and dynamically works out the schema. This would require a fair amount of going under the bonnet and seeing how Talend jobs work. I have done similar things in the past and am pretty sure your requirement could be achieved, but this may be approaching a solution which is far too Java heavy for a lot of people. Basically you can do whatever you want in Talend with Java. You have access to a world of Java APIs and can implement practically anything using them. 

 

There are probably other ways that you could approach this and I would need to know the complete requirement before I would personally decide upon the one I would use. But I should point out why this is not possible without a lot of work. If you take a look at the Talend code (look at the code tab on the development workspace) you will see that the schemas are implemented in code ( which is compiled). Take a look for static classes called "row{number}Struct" to get an idea. Since the configuration of components and jobs relies heavily on schemas, they are essentially the backbone of Talend jobs. If you use a repository schema in a series of jobs and modify that repository schema, every job using it will need to be altered to safely incorporate that change. More often than not, this is pretty automatic. The Enterprise Edition has some components that are built to enable more dynamic schema functionality without the need for long winded workarounds, but you would still be hard pushed to enable your scenario without a fair amount of work. 

Rilhia Solutions
Six Stars

Re: Dynamic schema alternative in Open Studio


rhall_2_0 wrote:

This isn't going to be possible as you describe it, but there are potential workarounds depending on the variety of schemas and how much work you want to put in. However, the first thing I want to point out is that you cannot write one job to deal with any schema. A common request in here is for a "magic job" that will dynamically work out the schema, understand the changing mapping requirements and write to a dynamic table/file. If this sort of job was easy, it would largely do Data Integration people out of a job (....this won't happen in our lifetimes since the permutations of scenarios that this magic job would have to handle are practically endless).

 

"Magic Job" rant aside, there are possibilities.

 

1) You could build a parent job which selects a suitable child job to pass the file to depending on data related to the file. This is probably the easiest approach to this problem, but will take a reasonable amount of work building several child jobs. It also requires that you know of the possible schemas at design time.  

2) This method is similar to the above, but instead of nesting your child jobs inside the parent job (or controlling job) you can configure your framework so that all are separate jobs. The parent job can easily start child jobs in separate JVMs. This would enable an easily extendable way of achieving this. You do not need to know of all of the schemas at initial design time and if a new one comes along you can simply create a new job for it. So long as the parent job has been created to be a dynamic as possible with how it starts "child jobs", you could incrementally build up quite a solution using this.

3) If you are comfortable using Java you *could* build a process which is Java heavy and dynamically works out the schema. This would require a fair amount of going under the bonnet and seeing how Talend jobs work. I have done similar things in the past and am pretty sure your requirement could be achieved, but this may be approaching a solution which is far too Java heavy for a lot of people. Basically you can do whatever you want in Talend with Java. You have access to a world of Java APIs and can implement practically anything using them. 

 

There are probably other ways that you could approach this and I would need to know the complete requirement before I would personally decide upon the one I would use. But I should point out why this is not possible without a lot of work. If you take a look at the Talend code (look at the code tab on the development workspace) you will see that the schemas are implemented in code ( which is compiled). Take a look for static classes called "row{number}Struct" to get an idea. Since the configuration of components and jobs relies heavily on schemas, they are essentially the backbone of Talend jobs. If you use a repository schema in a series of jobs and modify that repository schema, every job using it will need to be altered to safely incorporate that change. More often than not, this is pretty automatic. The Enterprise Edition has some components that are built to enable more dynamic schema functionality without the need for long winded workarounds, but you would still be hard pushed to enable your scenario without a fair amount of work. 


Thanks for the response @rhall_2_0.

 

I was actually busy with your first suggestion when I decided to ask the community.

 

It looks like I'll have to extend my timelines to focus on going "under the hood".

 

Do you mind sharing some ideas on what you did to solve similar problems as you mentioned, just to shed some light for direction.

 

To give you an idea of what I'm trying to solve:

 

I have a tHive input and tSybase input going into a tMap for a join which outputs some file, say tFileOutputDelimited.

 

I want to be able to use a parameter for the schema on the 2 inputs, parameter for the tmap and the output file.

 

I'd appreciate just a point in the right direction to avoid barking at a lot of wrong trees.

 

Thanks again

Moderator

Re: Dynamic schema alternative in Open Studio

Hello,


accuracie wrote:

Hi

 

I'm new to Talend so apologies if this is too much of a rookie question.

 

I've read a lot of posts saying Dynamic schema is not available on the open studio version of Talend and the suggested alternatives I've seen all solve the problem of mapping fields that are known before hand. e.g. Source table has columns 1 to 5 then rearrange(map) the order to match the target table.

 

My problem is I'd like to be able to "call" a predefined schema from the Metadata or just a file, at run time so that I pass the schema as a parameter and repeat this changing the schema based on my input list. This would be useful if I can use such a parameter to edit things like the Map editor for the tMap component.

 

I hope this isn't too far fetched using context.Variables or something similar.

 

Please advise on what my options are with this without using the Enterprise version.

 

Thank you

 

 


With talend open studio, schemas must be defined during design, not at run time.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Six Stars

Re: Dynamic schema alternative in Open Studio

Thanks @xdshi

 

I've seen a few posts where you mention this, however I was asking for an alternative solution, even if it has to be a more custom code option. 

Eleven Stars

Re: Dynamic schema alternative in Open Studio

As soon as you implement a join or lookup into a tMap you are going to be writing so much code to try to make it "dynamic" that you may as well not use Talend Open Studio and just write your own application. The complexities of this are significant. The best way to go about doing this with Talend Open Studio would be to create a job for each combination of schemas (main and lookup) you are using and parameterise how these are called by a main job. 

 

Out of interest, how many permutations of schemas will you need jobs for? 

 

If you still want to attempt to look at a route to doing this using code (I understand if you do, as I always like to try and push the boundaries with what I am told is possible....it's a great way to learn), take a look at the Code tab and see how a Talend Job makes use of the globalMap HashMap variable. All components use this and you can intercept this at runtime and "replace" components you specify at design time. I do this in a very simple demonstration of this in one of my tutorials (https://www.rilhia.com/quicktips/quick-tip-dynamically-change-db-connection). Once you understand how the back end code generate by Talend works, it opens quite a few doors.

 

However I should point out that you cannot guarantee a Talend job will be future proof if you use code in this way. I have never had any bad experiences so far, but I am aware that this would not be immune from problems should Talend significantly change their framework.

Rilhia Solutions
Six Stars

Re: Dynamic schema alternative in Open Studio


rhall_2_0 wrote:

As soon as you implement a join or lookup into a tMap you are going to be writing so much code to try to make it "dynamic" that you may as well not use Talend Open Studio and just write your own application. The complexities of this are significant. The best way to go about doing this with Talend Open Studio would be to create a job for each combination of schemas (main and lookup) you are using and parameterise how these are called by a main job. 

 

Out of interest, how many permutations of schemas will you need jobs for? 

 

If you still want to attempt to look at a route to doing this using code (I understand if you do, as I always like to try and push the boundaries with what I am told is possible....it's a great way to learn), take a look at the Code tab and see how a Talend Job makes use of the globalMap HashMap variable. All components use this and you can intercept this at runtime and "replace" components you specify at design time. I do this in a very simple demonstration of this in one of my tutorials (https://www.rilhia.com/quicktips/quick-tip-dynamically-change-db-connection). Once you understand how the back end code generate by Talend works, it opens quite a few doors.

 

However I should point out that you cannot guarantee a Talend job will be future proof if you use code in this way. I have never had any bad experiences so far, but I am aware that this would not be immune from problems should Talend significantly change their framework.


Thanks for the quick response @rhall_2_0

 

For a start I'm looking at about 70 different schemas and I know this will grow or shrink as the requirements change but as you rightly said, I'm also interested in the learning aspect.

 

I'll take the simpler 1st and 2nd suggestion for now just to have a solution in place but I will continue looking at the Talend code aspect as well.

 

Thanks a lot