Using the tSystem component

Six Stars

Using the tSystem component

I am using the tSystem component to run my python script in my Talend job. I would like to store the output of my python script to a context variable I defined in the Talend job, for every iteration the tSystem runs. How would I go about doing this?


Accepted Solutions
Ten Stars

Re: Using the tSystem component

you could use tSleep like I suggested above in talend but suggest doing it in Python.

In Python it depends on what you're doing and (we) should not be bothered, on the last line of code it's where the program should exit... however the dumping to stdout is a whole other thing in the last dying nano-seconds of your program's memory because python just tells your kernel it has something to print, python should wait for it but Im not 100% sure if it waits for the kernel receive status + showed or just a receive. Sleep in python and see if it works.

Ten Stars

Re: Using the tSystem component

No worries, if you are pushing files to hadoop, suggest using the web interface webhdfs (or something), and use a Talend component which is able to PUT (and POST). Otherwise use cUrl.
Im switching almost every web interaction to cURL cmd line, because components in Talend are, from my point of view, not fully supportive, they seem to focus on a very specific task, all seem to struggle with ssl, ssh, certificates, cookies, sessions, keystore... to much of a hassle.

All Replies
Ten Stars

Re: Using the tSystem component

set an Environment-var in which you store your Python output. Read this var in Talend and set your context var. in tSystem you can change from standard output (to console) into global variable

What type of output yoy plan to store? If its a lot of data... let python store its output into a file and read it. Safer approach.

Six Stars

Re: Using the tSystem component

Thank You for your reply. I am planning to store a string as output from my python script. The string does not have that many characters. I tried create and set an environmental variable in my tSystem component to my Python output but I kept on getting a null value. I then tried connecting a tJava component to my tSystem component and in the java code, set my context variable = (String)globalMap.get("tSystem_1_OUTPUT"). 

 

Just as a quick clarification, my python script takes in each file from fileList and parse its filename and returns it out from tSystem. By setting the context variable to the above code, I get the desired output but after my script parses the second file in the file list. That is, there is a lag from which I get the output I need. Would you know how I can fix this? Is it something with the return time in my python script?

Ten Stars

Re: Using the tSystem component

The output to a global var you need to make sure your python is dumping to standard output.
in python like : sys.stdout.write maybe 3.x it is print() , and python is able to run verbose...

Regard lag, tSystem has onComponent in stead of onSubjob... I used an IF with a 1==1 as test.
Worked for me and got output . System.out.println((String)globalMap.get("tSystem_1_OUTPUT"));

Not 100% sure on how python kernell ending cycles are, you could try a sleep in python or in talend, before you iterate into next script.

From an architectural point of view... not very efficient... why not start python once with multithreading and listen to a port and place everything on a queue (https://docs.python.org/3/library/queue.html).

Six Stars

Re: Using the tSystem component

Thank you for your response once again. My python does dump to standard output using sys.stdout.write. I do not have any sleep commands in my python script that causes the lag before the next iteration of the tFileList, however the lag still exists. Is there a way in which I could suspend the tFileList component's next iteration until my tSystem component returns the output to the global var?

Six Stars

Re: Using the tSystem component

Just as a quick note, my script does have loops and runs in time O(n) with the constant "c" being quite small despite the no of files as the script runs for each file specifically for each iteration. Could the time complexity of this script be an issue in which only linear-time complex scripts would not produce any lag. Other than that, the script is a normal pythonic script that has only one separate function in it.

Ten Stars

Re: Using the tSystem component

you could use tSleep like I suggested above in talend but suggest doing it in Python.

In Python it depends on what you're doing and (we) should not be bothered, on the last line of code it's where the program should exit... however the dumping to stdout is a whole other thing in the last dying nano-seconds of your program's memory because python just tells your kernel it has something to print, python should wait for it but Im not 100% sure if it waits for the kernel receive status + showed or just a receive. Sleep in python and see if it works.

Six Stars

Re: Using the tSystem component

Sorry for misunderstanding your previous response. The time.sleep() function in python did not work however, the tSleep component did the job. From this, I would only assume that the kernel only waits for the receive status and dumps the output immediately but i could also be wrong. The tSleep method is a way however I would say it is not efficient due to the minimum amount of time I could sleep with tSleep component is 1 second. By pushing thousands of files from my local file system to hbase, this could take quite a while for each file to pass, leave alone the size of the file itself. 

 

If there is any other alternative other than the two methods you mentioned below, please do advise me, otherwise thank you so much for your help!

Ten Stars

Re: Using the tSystem component

No worries, if you are pushing files to hadoop, suggest using the web interface webhdfs (or something), and use a Talend component which is able to PUT (and POST). Otherwise use cUrl.
Im switching almost every web interaction to cURL cmd line, because components in Talend are, from my point of view, not fully supportive, they seem to focus on a very specific task, all seem to struggle with ssl, ssh, certificates, cookies, sessions, keystore... to much of a hassle.