I'm using Big Data Sandbox 6.3_1.7 version. I was able to use hive components to achieve most of requirements (External tables, partition table, bucketing, etc). One last requirement I have is to use a python script as a part of my hive code. I tried executing add FILE /home/Desktop/my.py; command using tHiveRow component... but it did not really work.
Can someone help with an example on this please?
Sorry for delay. We have redirected your issue to our bigdata experts and then come back to you as soon as we can.
Python can be used as a UDF from Hive through the HiveQL TRANSFORM statement. For example, the following HiveQL invokes the hiveudf.py file stored in the default Azure Storage account for the cluster.
add file wasb:///hiveudf.py; SELECT TRANSFORM (clientid, devicemake, devicemodel) USING 'D:\Python27\python.exe hiveudf.py' AS (clientid string, phoneLable string, phoneHash string) FROM hivesampletable ORDER BY clientid LIMIT 50;