Hey All, We have 3 context environments, Development, Test, and Production. When we publish a job to the cloud in our Development environment, it does not pick up the correct context to run under. For example, when running the job locally, it uses the default context. When we publish the job to the Development environment in TIC, it does not pick up which context environment to use. When we run the job under Development, it still uses the default context environment. If we update the default context environment in the studio, then publish the job, it will run with the correct context environment. However, we can't use the promotion pipelines to promote the job. When we promote to Test, the job will still run with Development contexts. If we only had a few single jobs, this wouldn't be a big deal. But we have 50+ jobs and many of them are parent/child relationship. We were told by Talend that as long as the context environment name in Studio matches the environment name in TIC, then TIC should pick up which environment to run under. Any help on how everyone else handles this would be super appreciated!! Thank you!!
... View more
Sabrina, I am using Enterprise Edition, version is 6.4.1. The job will run successfully, but when I run it for larger datasets, it is exponentially slower. For example, a 5,000 record dataset takes 2 seconds to run. A 25,000 record dataset takes 5 minutes to run. A 100,000 record dataset has taken over 3 hours to run. When I run the input query using PG Admin or DB Visualizer, the query takes less than a second. Using different logging options, I can see the component queryCircRef is the component taking a long time to run. I have also tried creating a custom function at the database level to run, but see the same results. Running the custom function at the db level is quick, but run it on Talend and it is super slow. Thanks for your help!
... View more
I have a tough question about one of my components that is not scaling well. I am running a query using the tPostgresqlInput component. The query, shown below, checks for circular references between an employee and the manager in a single table. For testing purposes, I have been using a 25,000 record dataset. When I run this query using PG Admin or DB Visualizer, it will take less than 2 seconds to run. Additionally, using SAP BODS (our current ETL tool) the query will also run in less than 2 seconds. However, when I run this query through tPostgresqlInput it will take over 5 minutes to run.
I have tried a few different things to speed this query up, but to no avail. Inside Talend, I have tried adding additional memory on the JVM and running the subjob in parallel. Comparing the run time of the query between Talend and at the database level proves that their is something hampering the performance when ran through Talend. Any help understanding why Talend struggles with this query would be greatly appreciated. We need to be able to scale this query to process up to 500,000 records.
Thank you for any help! I truly appreciate it!
Running Job In DB Visualizer: 0.003 seconds
Running Query in Talend: 346 seconds
WITH RECURSIVE circular_managers(unique_id, mgr_unique_id, depth, path, cycle) AS ( SELECT u.unique_id, u.mgr_unique_id, 1, ARRAY[u.unique_id || '']::varchar, false FROM table1 u WHERE u.unique_id IS NOT NULL UNION ALL SELECT u.unique_id, u.mgr_unique_id, cm.depth + 1, path || u.unique_id, u.unique_id = ANY(path) FROM table1 u, circular_managers cm WHERE u.unique_id = cm.mgr_unique_id AND u.unique_id IS NOT NULL AND NOT cycle ) SELECT depth, array_to_string(path, ' > ') circular_managers FROM circular_managers WHERE cycle AND array_to_string(path, ' > ') NOT LIKE ' >%' AND array_to_string(path, ' > ') NOT LIKE '% > > %' AND path = path[array_upper(path, 1)] group by 1,2
... View more
Hey all, I am fairly new to Talend Integration. This is my first post, looking to gain some insight. We are migrating some of our current jobs from Business Objects. So here is our scenario: we load flat files (tab delimited) from our customers. Each file has a header record containing the field name (first_name, last_name, etc.). We allow our customers to send only the fields that they require, in any order. I am a bit lost on which components I can use to read each file, dynamically figure out what the schema is and load the data into a postgres table. For instance, our first file might contain firstName, lastName, and email. The second file might contain email, uniqueId, lastName, firstName, and loginId. The table we load this into contains the appropriate columns. Each file could contain a different quantity of columns. We have over 2000 files and can't build a custom mapping for each client. I guess I am a bit confused about what components are the ideal way to make this work. I have tried using tjava, but my java skills are not where they need to be. I have also tried using tSplitRow to create a data dictionary. Additionally, I have tried using tExtractDynamicFields, but couldn't get it to work. Any help in pointing me in the correct direction would be super appreciated! Lastly, if I am looking for a course or book to help my Java skills, especially related to Talend, can anyone make a recommendation? I don't want to become a java expert, but would like to expand my knowledge from where it currently is at (beginner).
... View more