I have a insurance database in oracle, for some reports queries will take very long time, i want to move that data to Hive using Talend Big data platform. From hive i will use some reporting tool like tableau to generate reports. How to move data from oracle to Hive using Talend ? I am new to talend.
Hello, There are three solutions to reach your goal: 1 - * Read your data from Oracle using a tOracleInput and output the data in a delimited file. * Use the tHiveCreateTable to create a table within Hive if the table doesn't exist yet. * Use the tHiveLoad to load the local delimited file into your Hive table. 2 - * Use the tHiveCreateTable to create a table within Hive if the table doesn't exist yet. * Use the tSqoopImport in order to take advantage of MapReduce in order to load your table into a HDFS folder. Of course, the table created above should be created at the top of this HDFS folder. 3 - (Only in Talend Platform with BigData) * You can create a MapReduce job using the tJDBCInput, a tMap to make the mapping and the tHDFSOutput in order to import your Oracle data within HDFS. Of course, a Hive table must be created at the top of the HDFS folder. Does this answer help you? Cheers, Rémy.
Hi rdubois, I think first step would help me, Should i try this in Talend Big data or Data integration studio ? I have tried now tOracleConnection_1 > tOracleInput_1 > tFileOutputMSDelimited_1 I got the output file saved as .csv and data from oracle is present in it. Thank you. Should i change the Field separator ; and Row separator \n to other for Hive ? I have my hadoop and hive in cloudera distribution in my VMware. Thank you very much for reply. How to proceed next furthur ? To connect tFileOutputMSDelimited_1 to tHiveLoad ? It is not showing both tHiveCreateTable and tHiveLoad option in both Talend DI and BD studio. I only have tHiveClose, tHiveConnection, tHiveRow in Talend DI and extra tHiveInput and tHiveRow in Talend BD. Please send me if there is any video or step by step procedure. Thanks again.
Hi rdubois, Can please tell me in which machine i should install talend whether in the machine where oracle is present or where hive is present, as you suggested to store the data in a delimited file the MS excel may collapse since data is huge, I read your post here http://www.talendforge.org/forum/viewtopic.php?id=31745 can you please tell me the steps to use sqoop import, where to use it whether in the place where hive is installed ? I am not able to see tHiveCreateTable in my Talend BD and DI studio, Please help me on this.
Hello, I'm sorry to hasve talked about the tHiveCreateTable and tHiveLoad which are 2 components coming with the future 5.4.0. Nevertheless, you can create your hive table and load the data using the tHiveRow (one for the table creation and another one for the loading) but you will have to write the Hive query by hand. "Should i change the Field separator ; and Row separator \n to other for Hive ?" If your local file is ';' separated, then you should use this same character as a field separator in Hive. The syntax in HQL for this requirement is: "CREATE TABLE name (col string, col2 int) ROW FORMAT FIELDS TERMINATED BY ';'" "Can please tell me in which machine i should install talend". It doesn't matter. Talend can be installed on the Oracle machine, on the Hive machine or on a third machine. Regarding Sqoop, you can have a look at this movie: http://www.youtube.com/watch?v=NzeRMqPZ0yU&list=PL0aRSCaII9DeHyYj0JyvcLEbvLetxmEqP&index=2 --> This movie is a 5.2 movie which is not totally up-to-date for a 5.3 but can give you an overview on how the Export component work. For more information about the Import component, please click on the tSqoopImport in your designer and press F1. HTH, Rémy.
Thanks rdubios for reply. But my requirement is changed now. To generate a single report i have to import data ie columns from multiple tables(not all tables - i have tried apache sqoop which does not serve my purpose), i have to map different columns from multiple tables, i have seen tMap in Talend but not sure whether talend can do this job of mapping columns from multiple tables of different size. Then move that mapped data into a file then that file to hive or if possible to move directly to hive( as it is still in beta stage - hoping to see in talend 5.4) i can store that data in a delimited file and i can load that file into Hive. Please let me know more on this in Talend. How to do?