Five Stars

Using Python UDF in PIG Components

Hi Everyone,


I'm using Sandbox 1.7 version. I'm trying my hands on PIG udf... It was easy to code it in java using Pig UDF option given in repository.

I then started wondering if there is any feature available for Python UDFs as well... could not find one. But then thought... I can use pigCode component to register a python UDF and use it my PIG job. But looks like its not that straight forward.

Can someone help me achieve this use case. Here is what I'm looking to achieve using PIG components (pasting PIG code below )  

Python Script : 

from pig_util import outputSchema @outputSchema('word:chararray') def hi_world(): return "hello world"


PIG Code:

-- first register it to make it available

REGISTER '' using jython as my_special_udfs

users = LOAD 'user_data' AS (name: chararray);

hello_users = FOREACH users GENERATE name, my_special_udfs.hi_world();

  • Big Data
Tags (2)