How to push data from HDFS to Hive Table

One Star

How to push data from HDFS to Hive Table

Hi All,

I am not able to push data from HDFS to Hive .
Objective :
Create tables and load data into Hive.

Platform :
CDH 4.4 .
TOS for Big Data 5.4.0
Ubuntu OS.

Architecture :
Both CDH and TOS are on same machine(CDH with Single host implementation)
Components being used are :
tHiveConnections
tHiveCreateTable
tHiveLoad
More info :
Screen shot.

One Star

Re: How to push data from HDFS to Hive Table

Error Log:
Starting job Test at 06:53 14/11/2013.

connecting to socket on port 4047
connected
: org.apache.hadoop.hive.conf.HiveConf - hive-site.xml not found on CLASSPATH
: org.apache.hadoop.conf.Configuration - mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
: org.apache.hadoop.conf.Configuration - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
: org.apache.hadoop.conf.Configuration - mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
: org.apache.hadoop.conf.Configuration - mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
: org.apache.hadoop.conf.Configuration - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
: org.apache.hadoop.conf.Configuration - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
: org.apache.hadoop.conf.Configuration - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@2b03be0:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
: org.apache.hadoop.conf.Configuration - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@2b03be0:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
: org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Opening raw store with implemenation classSmiley Surprisedrg.apache.hadoop.hive.metastore.ObjectStore
: org.apache.hadoop.hive.metastore.ObjectStore - ObjectStore, initialize called
: DataNucleus.Persistence - Property datanucleus.cache.level2 unknown - will be ignored
: org.apache.hadoop.conf.Configuration - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@bf47ae8:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
: org.apache.hadoop.conf.Configuration - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@bf47ae8:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
: org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.hive.metastore.ObjectStore - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
: org.apache.hadoop.hive.metastore.ObjectStore - Initialized ObjectStore
Hive history file=/tmp/root/hive_job_log_8aaa235f-3f7d-4acc-a297-a13212b2ef24_1181626818.txt
: hive.ql.exec.HiveHistory - Hive history file=/tmp/root/hive_job_log_8aaa235f-3f7d-4acc-a297-a13212b2ef24_1181626818.txt
: org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
: org.apache.hadoop.conf.Configuration - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@4eda1515:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
: org.apache.hadoop.conf.Configuration - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@4eda1515:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
: org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.hive.service.HiveServer - Putting temp output to file /tmp/root/8aaa235f-3f7d-4acc-a297-a13212b2ef245572030930163956377.pipeout
: org.apache.hadoop.hive.service.HiveServer - Running the query: set hive.fetch.output.serde = org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
: org.apache.hadoop.hive.service.HiveServer - Putting temp output to file /tmp/root/8aaa235f-3f7d-4acc-a297-a13212b2ef245572030930163956377.pipeout
: org.apache.hadoop.hive.service.HiveServer - Running the query: SET mapreduce.framework.name=yarn
: org.apache.hadoop.hive.service.HiveServer - Putting temp output to file /tmp/root/8aaa235f-3f7d-4acc-a297-a13212b2ef245572030930163956377.pipeout
: org.apache.hadoop.hive.service.HiveServer - Running the query: SET yarn.resourcemanager.address=mlbis.local:8032
: org.apache.hadoop.hive.service.HiveServer - Putting temp output to file /tmp/root/8aaa235f-3f7d-4acc-a297-a13212b2ef245572030930163956377.pipeout
: org.apache.hadoop.hive.service.HiveServer - Running the query: CREATE TABLE IF NOT EXISTS tkt_details(DeviceName STRING,TicketCount STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '
' STORED AS TEXTFILE
: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver -
: hive.ql.parse.ParseDriver - Parsing command: CREATE TABLE IF NOT EXISTS tkt_details(DeviceName STRING,TicketCount STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '
' STORED AS TEXTFILE
: hive.ql.parse.ParseDriver - Parse Completed
: org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - Starting Semantic Analysis
: org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - Creating table tkt_details position=28
: hive.metastore - Trying to connect to metastore with URI thrift://mlbis.local:9083
: hive.metastore - Waiting 1 seconds before next connection attempt.
OK
: hive.metastore - Connected to metastore.
: org.apache.hadoop.hive.ql.Driver - Semantic Analysis Completed
: org.apache.hadoop.hive.ql.Driver - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.conf.Configuration - mapred.job.name is deprecated. Instead, use mapreduce.job.name
: org.apache.hadoop.hive.ql.Driver - Starting command: CREATE TABLE IF NOT EXISTS tkt_details(DeviceName STRING,TicketCount STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '
' STORED AS TEXTFILE
: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.Driver - OK
: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.service.HiveServer - Returning schema: Schema(fieldSchemas:null, properties:null)
: org.apache.hadoop.hive.service.HiveServer - Running the query: LOAD DATA INPATH '/tmp/logs/mssql.txt' OVERWRITE INTO TABLE tkt_details
: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver -
: hive.ql.parse.ParseDriver - Parsing command: LOAD DATA INPATH '/tmp/logs/mssql.txt' OVERWRITE INTO TABLE tkt_details
: hive.ql.parse.ParseDriver - Parse Completed
Loading data to table default.tkt_details
: org.apache.hadoop.hive.ql.Driver - Semantic Analysis Completed
: org.apache.hadoop.hive.ql.Driver - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver - Starting command: LOAD DATA INPATH '/tmp/logs/mssql.txt' OVERWRITE INTO TABLE tkt_details
: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.exec.Task - Loading data to table default.tkt_details from hdfs://mlbis.local:8020/tmp/logs/mssql.txt
Failed with exception Unable to move sourcehdfs://mlbis.local:8020/tmp/logs/mssql.txt to destination /user/hive/warehouse/tkt_details/mssql.txt
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
: org.apache.hadoop.hive.ql.exec.Task - Failed with exception Unable to move sourcehdfs://mlbis.local:8020/tmp/logs/mssql.txt to destination /user/hive/warehouse/tkt_details/mssql.txt
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move sourcehdfs://mlbis.local:8020/tmp/logs/mssql.txt to destination /user/hive/warehouse/tkt_details/mssql.txt
at org.apache.hadoop.hive.ql.metadata.Hive.renameFile(Hive.java:2031)
at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2175)
at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:627)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1301)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:234)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:192)
at org.apache.hadoop.hive.jdbc.HiveStatement.execute(HiveStatement.java:132)
at demobigdata.test_0_1.Test.tHiveLoad_1Process(Test.java:724)
at demobigdata.test_0_1.Test.tHiveCreateTable_2Process(Test.java:624)
at demobigdata.test_0_1.Test.tHiveConnection_1Process(Test.java:450)
at demobigdata.test_0_1.Test.runJobInTOS(Test.java:970)
at demobigdata.test_0_1.Test.main(Test.java:835)
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied by sticky bit setting: user=root, inode="/tmp/logs/mssql.txt":admin:supergroup:-rw-r--r--
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkStickyBit(FSPermissionChecker.java:245)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:146)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4705)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4687)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkParentAccess(FSNamesystem.java:4655)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:2695)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:2663)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:2642)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:610)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44964)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1427)
at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:339)
at org.apache.hadoop.hive.ql.metadata.Hive.renameFile(Hive.java:2027)
... 18 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied by sticky bit setting: user=root, inode="/tmp/logs/mssql.txt":admin:supergroup:-rw-r--r--
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkStickyBit(FSPermissionChecker.java:245)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:146)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4705)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4687)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkParentAccess(FSNamesystem.java:4655)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:2695)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:2663)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:2642)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:610)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44964)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745)
at org.apache.hadoop.ipc.Client.call(Client.java:1237)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy14.rename(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy14.rename(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:355)
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1425)
... 20 more
: org.apache.hadoop.hive.ql.Driver - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver -

disconnected
Job Test ended at 06:53 14/11/2013.
Employee

Re: How to push data from HDFS to Hive Table

You just have a permission issue within HDFS.
Please change that using hadoop fs -chmod and haddop fs -chown
One Star

Re: How to push data from HDFS to Hive Table

Hi Remy ,
can you please suggest me on the components i should be using to write data directly from hDFS to Hive table. I tried with tHiveLoad which upload data from file and not from HDFS file . Can you please guide me . Currently i am pushing or writing data to a file on linux box and then fetch that to hive tables using tHiveload.
Employee

Re: How to push data from HDFS to Hive Table

Hi,
If your data is already on HDFS, then I think you can use the tHDFSCopy to move the data from the current location to the HDFS location of your Hive table.
Does it make sense?
One Star

Re: How to push data from HDFS to Hive Table

Hi Remy,
Makes sense . Thanks a lot. Can you do me another favor . Can you suggest any better BI engine which would fit better with Talend Open Studio for Big Data. Currently , i am trying with SpagoBI and find it pretty confusing and complex. Thanks.
One Star

Re: How to push data from HDFS to Hive Table

Hi 
I think problem in your tHiveLoad component.Your local has been checked.Checked it out and give the hdfs file path,it will load data to your Hive Table directly.

What’s New for Talend Spring ’19

Join us live for a sneak peek!

Sign up now

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch