Scanning a Talend zip export

Seven Stars

Scanning a Talend zip export

I made a boo-boo the other day. I promoted some code to production that had tLogRow components active. Not only is this inefficient, but it bloats the log file and could even have written personal data to log files, which would be hard to justify in terms of GDPR regulations.

 

So, I wrote a Python script to scan a Talend export file for active tLogRow components and print them out. If you are exporting a dozen jobs with 50 components on each, and you use tLogRow for debugging, this could be a useful safety net.

 

import sys
import os.path
import zipfile
from xml.dom import minidom
import re

print "====================="
print "Talend Export Scanner"
print "====================="

if (len(sys.argv)) < 2:
    print "Insufficent args"
    exit()

zipfilename = sys.argv[1]

with zipfile.ZipFile(zipfilename, 'r') as zf:
    for name in zf.namelist():
        if name.endswith('/'): continue
        if not name.endswith('.item'): continue
        if not re.match("^\w+/process/",name): continue

        #print "Parsing " + name
        f = zf.open(name)
        xdoc = minidom.parse(f)
        nodes = xdoc.getElementsByTagName('node')
        #print "There are " + str(len(nodes)) + " nodes"
        for node in nodes:
            node_problems = 0
            if node.attributes["componentName"].value == "tLogRow":
                posX = node.attributes["posX"].value
                posY = node.attributes["posY"].value
                ePs = node.getElementsByTagName('elementParameter')
                #print "There are " + str(len(ePs)) + " elementParameters"
                active = True
                label = ""
                for eP in ePs:
                    if eP.attributes['name'].value == 'ACTIVATE' and eP.attributes['value'].value == 'false':
                        active = False
                    if eP.attributes['name'].value == 'UNIQUE_NAME':
                        unique_name = eP.attributes['value'].value
                    if eP.attributes['name'].value == 'LABEL':
                        label = " (" + eP.attributes['value'].value + ")"
                if active:
                    if node_problems == 0:
                        print
                        print "Job: " + name
                    node_problems += 1
                    print "Component " + unique_name + label + " is active"

print

It should be picking up the label, the on-screen name of the component, but it isn't. I'll post a fix when I get time to look at it. Suggestions welcome.

 

Also welcome are suggestions for anything more that could be checked. I'm considering checking for a couple of connection objects that are different in production to development, and alerting if those are accidentally included, but those would be hard-coded to the names of the objects that we use in our environment. Maybe I could push them into a config file.

Community Manager

Re: Scanning a Talend zip export

Hi PhilHibbs
Thanks for sharing your Python script, it might be helpful for others. Your another thread explains why you need to scan the Talend zip export.
https://community.talend.com/t5/Design-and-Development/Cloning-a-set-of-jobs/m-p/154484#M94435

Regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
Seven Stars

Re: Scanning a Talend zip export

Yes, that other thread gives another reason to do something similar - I created a different script to do that, basically a cut down version of this one. If I had the time, I'd make it into a single generic script with command line switches, a bit like Unix find with switches like --jobname, I will post back here if I ever do!

Community Manager

Re: Scanning a Talend zip export

Nice work!

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

APIs for Dummies

View this on-demand webinar about APIs....

Watch Now

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog

Why Companies Move to the Cloud: 7 Success Stories

Learn how and why companies are moving to the Cloud

Read Now