Six Stars

Hadoop to Oracle with Dynamic schema

Hi All,


I have the following scenario at hands :

  • I receive a flat file on a daily basis that is imported into HDFS.
  • This file has a variable amount of columns
  • I need to design a job that takes the data from this HDFS file, and populates an Oracle table.
  • This Oracle table will be dropped and recreated at every run due to the variable amount of columns
  • All datatypes can be defined as String (VARCHAR2) in the Oracle table. (As I think there's no way to 'guess' the datatypes at runtime from a CSV.

I've been breaking my head around the design, and have come up with the following design limitations at the moment.


  • Scenario 1
    • tHDFSInput -> tOracleOutput
    • I would be using the a 1 column schema of type 'Dynamic' to achieve this
    • Issue : Type Dynamic is not supported by tHDFSInput
  • Scenario 2
    • Script a Hive Create Table statement based on the header row of the hdfs file
    • Create the external table pointing to the HDFS file
    • tHiveInput -> tOracleOutput
    • Issue : Type Dynamic is not supported by tHiveInput

Is there a trick to be able to use a Dynamic schema in Hive/HDFS components?

Or I there a better way all together to approach this design?


Thanks a lot for your input,



Five Stars

Re: Hadoop to Oracle with Dynamic schema

In Scenario 2 , Build the select query dynamically using Hive Metastore for that table. 


Re: Hadoop to Oracle with Dynamic schema


Here is a new feature jira issue about "Support Hive Metastore HA".

Best regards


Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.