Dataprep, Big Data and security



We have been asked some question around DataPreparation and security.


  1. Does to tool support impersonation  on prep activities? For instance, if have a user that belong to a group defined in the cluster as CDO_1 and by definition I am a super user in Sentry can I create and maintain groups within tool without it being defined in sentry?       
  2. Can the tool handle a user role belonging to more than one group?
  3. Can I prepare a file and write back into HDFS as parquet?
    1. Is there a security framework or policy we need to be in compliance with for the tool to allow for this?
  • Dataprep

Re: Dataprep, Big Data and security

Hi Adrien,


1. We do support impersonation given that you can specify the username on the input and output sides. I didn't fully get the group-related question and neither did @tfion... the answer is most likely "no", but can you provide more details?

2. Ditto above: we didn't exactly get the question.

3. It depends on where the source file comes from. You can do HDFS to HDFS (and per your "write back" below I assume it is what you're trying to achieve). But you cannot prepare an Excel file that you uploaded to Data Prep (or any "Local file" per the Data Prep terminology)  and write it to HDFS. No specific security framework or policy to comply with.






Re: Dataprep, Big Data and security

Ok thank you.


I'll rach back to the prospect to clarify the questions.